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CHAPTER  1 


INTRODUCTION 


1. 1.    BACKGROUND  AND  JUSTIFICATION 

In  recent  years,  there  has  been  a  substantial  increase  in 
the  number  of  individuals  and  organizations  using  electronic 
mail  facilities.  Most  electronic  mail  systems  including 
those  of  the  Unix  system  provide  the  user  with  facilities  to 
create,  send,  receive,  save  and  retrieve  messages.  The  4. 3 
BSD  Unix  mail  system  provides  the  user  with  several  methods 
for  accessing  mail.  Given  the  current  usage  of  the  Unix  mail 
facility,  it  is  evident  that  the  improvement  and  enhancement 
of  the  Unix  mail  system  is  a  matter  of  considerable 
importance. 

At.  the  simplest  level,  it  would  be  desirable  that  a  user 
should  be  able  to  retrieve  messages  via  the  following  message 
items : 

Date  of  receipt 

Date  of  sending 

Sender 's  identity 

Recipient's  identity 

Subject  of  the  message  ,. 

Words  or  a  phrase  in  a  message  body. 


The  QIRM  < Query  system  for  Information  Retrieval  In  a 
Mailbox)  system,  which  is  designed  by  the  author  and 
described  in  and  implemented  in  the  present  work,  relies  on 
searching  for  information  in  the  headers  or  the  bodies  of 
messages  which  have  been  already  saved  in  the  system  mailbox 
file.  The  QIRM  system  is  capable  of  searching  the  complete 
text  of  all  messages  for  words  or  phrases  specified  by  the 
user-  The  retrieval  functions  of  this  system  are  flexible 
enough  to  permit  the  user  to  categorize  the  desired 
message<s).  The  requested  information  is  retrieved 
according  to  the  fields  and  conditions  specified. 

A  query  language  provides  a  convenient  scheme  for 
retrieving  information  from  databases.  If  we  view  the 
mailbox  as  a  repository  of  stored  messages  (i.e.,  as  a 
database  file),  we  can  employ  queries  expressed  in  some 
'mail-query  language'  to  extract  information  from  the 
mailbox. 

In  the  present  application.  Lisp  has  been  chosen  as  the 
host  language  for  implementing  the  mail  query  language.  One 
major  factor  which  motivated  this  decision  is  the  symbol - 
manipulation  capabilities  provided  by  Lisp.  A  central  role 
is  played  in  QIRM's  implementation  by  a  semantic  net  data 
structure  which  determines  the  correspondence  between  symbols 
and  the  roles  they  play  within  a  message.  Lisp  maintains  all 
of  the  properties  and  values  together  in  a  property  list 
associated  with  each  atom.  These  property  lists  constitute  a 
simple  kind  of  database.  For  an  example  of  the  use  of 
property  lists,  consider  a  database   of  information  about  the 


messages  in  a  mailbox.  A  property  list  associated  with  each 
message  ( atom)  could  be  used  to  cache  its  property  values 
with  corresponding  property  names  (e.g.,  sending  date, 
receiving  date,  sender's  id-name,  recipient's  id-name, 
status,  subject,  and  message  body ) .  The  global  variable 
MAIL  could  be  used  to  hold  the  list  of  all  the  messages  in  a 
mailbox.  By  employing  the  various  facilities  provided  by 
Lisp,  the  data  from  a  'mailbox  file'  can  be  embodied  in  a 
database  by  using  the  concept  of   'semantic  nets'. 

In  order  to  enhance  the  functionality  of  the  mail  system, 
the  messages  which  the  system  handles  have  to  be  interpreted 
at  least  partially.  In  this  way,  users  can  query  the  message 
system  to  find  messages.  The  message  system  needs  some 
guidance  if  it  is  to  interpret  messages  correctly.  Such 
guidance  can  be  provided  by  superimposing  some  structure  on 
the  messages.  The  breakup  of  the  header  into  fields  and  the 
interpretation  of  each  field  provide  a  suitable  structure 
which  can  be  used  for  locating  messages.  Since  the  facility 
allows  the  user  to  specify  the  partial  contents  of  a 
message,  an  additional  structure  based  on  the  message 
contents  can  be  superimposed.  This  structure,  which  relies 
on  the  various  information  fields  of  messages,  can  be 
implemented  by  employing  several  concepts  and  techniques  of 
Artificial  Intelligence  such  as  frames,  production  rules, 
inheritance  [ 4 D ,  and  semantic  nets. 

Even  though  QIRM  is  based  on  such  primitive  techniques  of 
artificial  intelligence  as  semantic  nets,   property  lists  and 


matching,  the  prototype  system  could  be  used  as  the  first 
stage  in  the  design  and  development  oi  an  intelligent  system 
for  electronic  mail  handling.  The  approach  which  treats 
messages  according  to  their  contents  encourages  the 
integration  of  a  message  system  with  office  functions 
performed  on  messages.  In  addition,  this  approach  allows  the 
integration  of  message  and  database  facilities.  Such 
versatility  is  one  of  the  goals  of  office  automation  [53. 

1.2.    REPORT  ORGANIZATION 

The  rest  of  this  report  is  organized  into  three  main 
chapters.  Chapter  2  provides  a  review  of  the  literature 
dealing  with  the  concepts  employed  in  the  present 
implementation.  A  brief  overview  of  electronic  mailbox 
systems  is  provided  along  with  definitions  of  relevant  terms. 
Some  notable  aspects  of  query  languages,  semantic  nets,  and 
the  language  used  to  implement  the  project  are  also 
presented.  At  the  end  of  that  chapter,  research  work  related 
to  this  report  ie  summarized.  Chapter  3  deals  with  the 
design  of  the  QIRM  system.  Along  with  the  presentation  of 
the  objectives  of  the  system  and  the  environment  employed, 
there  are  descriptions  of  how  the  database  is  constructed, 
how  the  interface  to  the  query  system  works,  and  how  the 
tasks  of  data  searching,  extraction,  and  output  gener  at ion 
are  performed.  Chapter  4  offers  concluding  remarks  and 
suggests  future  extensions  of  this  work. 

There   are   two   appendices.      Appendix   A   is   a   manual 


providing  the  syntax  and  user  interface  of  this  system. 
Appendix  B  is  a  listing  of  the  source  code  of  the 
implementation- 


CHAPTER  2 
REVIEW  OF  THE  LITERATURE 

2.1.   ELECTRONIC  MAILBOX  SYSTEMS 

This  section  consists  of  three  parts.  The  first  part 
defines  the  terminology  related  to  the  concept  of  an 
electronic  mailbox  system.  In  the  second  part  of  this 
section,  the  basic  mailbox  facilities  are  briefly  described. 
Various  approaches  towards  structuring  mailbox  facilities  are 
presented  in  the  last  part.  One  of  these  approaches  forms 
the  basis  for  the  implementation  of  the  OIRM  system. 

2. 1. 1.   Terminologies 

There  is  some  inconsistency  in  the  use  of  terminology 
such  as  Electronic  Mail,  Mailbox,  Electronic  Message 
Systems.  In  fact,  different  people  use  each  of  these  terms 
to  mean  different  things.  Consequently,  it  is  not  possible 
to  provide  clear  cut,  commonly  accepted  definitions  for  them. 
Another  problem  is  that  the  technology  is  moving  so  rapidly 
that  the  concepts  and  terminology  have  to  change  quite  often 
just  to  keep  pace.  Nevertheless,  in  order  to  place  some  of 
these  terms  in  context,  we  shall  define  them  by  combining 
definitions  in  several  references C 1,  6,  10,  11,  20'3  based  on 
an  office  automation  approach. 

Message   refers   to  a  single  letter  from  a   sender   to   be 
transferred   via  the  system.     It  usually  has  two   parts:   an 


•nv*lop»  and  oonttnti.  The  envelope  contains  information 
needed  by  the  system  to  get  the  message  to  the  correct 
mailbox  and  typically  contains  the  name  and  address  of  the 
mailbox  to  which  the  message  is  being  sent.  The  contents  can 
be  separated  into  two  parts:  a  header  and  a  body.  Header 
information  contains  predefined  fields  associated  with  the 
messaging  process  such  as  'subject',  'time  sent',  'to'  and 
' status ' .  A  status -field  contains  such  an  information  as 
'Has  the  message  been  read  or  not? '  or  ' Is  the  message  old  or 
new?'.  The  system  will  request  any  essential  envelope  and 
header  information  before  sending  a  message.  The  message 
body  is  free- format  text.  Some  kind  of  word  processing 
facility  is  usually  provided  for  the  sender  to  input  and  edit 
the  body  of  the  text  at  will  ( see  Figure  1 ) . 

Electronic  Mail  is  a  collection  of  electronic  text 
messages  to  be  transferred  uni-directionally,  via  a 
computer -assisted  communication  system,  from  an  identified 
sending  party  to  one  or  more  identified  receiving  parties.  It 
must  be  mentioned  that  the  term  is  often  used  to  refer  to  the 
electronic  distribution  of  complete  documents,  composed  of 
text,  data,  images  and  other  forms  of  information. 

Electronic  mailbox  systems  or  Electronic  mail  systems  are 
used  to  describe  computer-operated  message  systems  which  hold 
messages  in  mailboxes,  thereby  allowing  the  user  to  access 
and  send  messages  at  times  and  places  which  the  user  chooses. 
Mailbox  is  a  term  sometimes  used  to  refer  to  a  store  of 
messages,  or  to  a  program  which  provides  users  with  access  to 
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Figure  1.     Message  Structure 


these  messages.  We  shall  use  the  term  "mailbox"  to  refer  to 
a  file  of  incoming  messages.  In  the  Unix  system,  mailboxes 
are  typically  located  in  files  corresponding  to  users  *  login 
names  existing  in  the  directory  " /usr/ spool /mail " .  Along  with 
a  mailbox,  the  system  has  the  facility  of  stored,  delayed 
communication.  Nondelayed  systems  do  not  provide  storage 
capacity.  Therefore,  they  only  support  instantaneous 
communication.  Conventional  telex  and  telephone  systems,  and 
the  direct  terminal-to-terminal  working  on  a  multi-access 
computer  system  are  non-delayed  mall  systems.  Store-and- 
forward  telex,  telephone  answering  systems  and  electronic 
mail  systems  are  examples  of  delayed  systems. 

Electronic  Message  Systems  ( EMS )  refers  to  the  entire 
range  of  electronic  communication  systems.  EMS  provides 
communications  service  to  users  based  on  the  transmission  of 
text,  voice,  image,  or  any  combination  of  these  three. 
Thus,  Facsimile,  Video  Conferencing,  PABX -based  telephone 
systems.  Voice  mailbox  systems.  Telex,  and  Teletex  are 
examples  of  EMS.  However,  the  term  EMS  is  sometimes  used  to 
specif icially  refer  to  Electronic  Mailbox  Systems. 

2.1.2.   Basic  Mailbox  Facilities 

All  Electronic  mail  systems  including  those  in  the  Unix 
system  have  a  wide  range  of  basic  facilities.  Such 
facilities  can  be  categorized  as  follows. 

2.  1. 2. 1.   Creating  Messages 

An   editing   facility  is  required  for  creating   messages. 


The  sophistication  of  this  facility  varies  from  system  to 
system.  Some  systems  only  allow  editing  within  the  current 
line  being  input,  while  others,  including  Unix,  provide  more 
comprehensive  word  processing  facilities  with  sophisticated 
features  such  as  the  ability  to  move  paragraphs  around. 

2.1.2.2.   Sending  Messages 

The  capability  to  send  a  single  message  simultaneously  to 
several  people  is  an  important  feature  provided  by  an 
electronic  mailbox  system.  This  is  achieved  by  permitting 
the  specification  of  multiple  addresses.  Some  systems 
including  Unix  provide  a  more  powerful  version  of  this 
facility  by  providing  copies  of  distribution  lists,  already 
filed  away  in  the  system,  to  be  called  up  by  using  a  short 
abbreviation.  This  is  a  powerful  feature  since  it  allows 
perhaps  hundreds  of  copies  to  be  sent  by  a  user  with  no  work 
beyond  that  required  to  send  just  one  copy. 

2.1.2.2.   Receiving  Messages 

Most  electronic  mailbox  systems  provide  a  scanning 
facility  for  detecting  waiting  messages,  and  associated 
facilities  that  allow  the  user  to  select  the  order  for 
accessing  messages.  Reply  and  forwarding  facilities  for 
assisting  mailbox  owners  are  also  provided.  The  reply 
facility  automatically  completes  the  header  information  such 
as  'to  *  and  ' subject '  by  copying  the  information  from  the 
message  being  read.    Forwarding  allows  a  message  to  be   sent 


on  to  another  mailbox  user  along  with  any  annotation  that  the 
sender  vishes  to  add.  Having  read,  replied  to,  or  forwarded 
a  message,  the  recipient  can  either  delete,  file  or  leave  the 
message  pending  within  his  mailbox. 

2.1.2.3.   Filing  and  Retrieval 

Some  sort  of  message-filing  capability  is  offered  by  most 
mailbox  systems.  However,  the  size  of  the  store  may  vary 
from  system  to  system,  as  may  the  index  facility  associated 
with  the  file.  Search  and  retrieval  of  information  held  in 
these  electronic  files  is  usually  undertaken  using  indexing 
classifications.  The  Unix  mail  system  allows  a  user  to 
retrieve  information  on  message  number,  subject,  and  sender's 
id. 

2.1.3.   Mailbox  System  Structuring 

More  formal  applications  can  be  carried  out  via  mailbox 
systems  if  additional  facilities  -  mailbox  structures  -  are 
provided.  Mailbox  structures  refer  to  rules  which  organize 
mailbox  messaging  so  that  specific  objectives  for  the 
communication  can  be  met.  The  rules  can  be  imbedded  in  the 
software  and  imposed  automatically  by  a  system.  These  can 
also  be  applied  and  policed  by  the  user  of  the  system.  CI,  6] 

Mailbox  communication  can  be  structured  by  organizing 
the  information  of  messages  [1,  6  3.  Software  can  sort 
communication  into  categories  based  on  the  content  of 
messages    and    the   interest   groups    with    self -selected 


memberships,  for  instance,  or  can  permit  recipients  to  route 
or  filter  messages  by  the  information  of  messages  such  as 
keyword,  subject,  and  author. 

Structures  can  act  as  an  aid  to  making  decisions,  getting 
agreement,  and  controlling  work  that  is  done  via  a  mailbox 
system  CI,  63.  Message  summarizing  or  condensation  can  be 
accomplished  by  structuring  the  form  of  inputs.  Senders 
might  be  required  to  adhere  to  length  limitations,  or  to  use 
votes  or  other  numeric  estimates  instead  of  full  messages. 
Summarizing  can  also  be  performed  by  human  digesters  who  read 
incoming  items,  discard  irrelevant  ones,  and  summarize  others 
before  posting  them. 

In  order  to  maintain  ordered  and  useful  communications, 
it  may  be  necessary  to  control  the  access  to  the  available 
facilities.  A  designated  human  leader  or  a  software 
structure  can  help  to  perform  these  tasks:  allocation  and 
removal  of  mailboxes,  allocation  of  basic  facilities  (e.g., 
' read  only '  or  'read  and  write*),  creation  and  control  of 
distribution  list  or  activities.  Social  pressure  can  also 
be  used  for  this  purpose  -  often  group  members  collectively 
censure  an  errant  member.  If  pen  names  are  used  or  anonymity 
is  maintained,  individuals  can  vote  to  sanction  or  criticize 
errant  members  without  embarrassing  themselves.    [ 1 ,  2,  4  3 

The  QIRM  system  is  based  on  "structuring  communications 
by  organizing  the  information  of  messages".  Messages  in  the 
mailbox  file  can  be  organized  into  a  database.  This  allows 
the  retrieval  of  mail  items  via  a  user-f reindly  query 
language.     It   is  an  obvious  advantage  to  be  able  to   search 
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for  messages  using  particular  keys  and  conditions  specified 
by  the  user.  In  the  QIRM  system,.  the  user  can  query  the 
retrieval  system  to  find  some  specific  information  on 
sender's  id,  subject,  sending  date,  receiving  date,  message 
number,  status,  recipient ' s  id,  and  word  or  phrase  in  message 
body.  In  order  to  answer  a  query  with  proper  information, 
the  system  imposes  some  structure  on  the  messages.  This 
structure  is  known  to  the  system  and  used  for  the 
interpretation  of  the  message. 

2. 2.   QUERY  LANGUAGES 

2. 2. 1.   Overview 

One  of  the  main  objectives  of  organizing  a  large  quantity 
of  data  in  a  computer  storage  is  to  be  able  to  retrieve, 
modify,  or  insert  any  subset  of  the  data.  The  user 
communicates  his  requests  for  information  in  the  form  of 
queries.  The  computer  receives  the  queries,  analyzes  them, 
and  proceeds  to  search  and  operate  on  the  desired 
information.  When  a  database  is  organized,  considerable 
attention  is  given  to  the  set  of  queries  that  will  be 
directed  at  it.  This  enables  the  choice  of  the  most  suitable 
database  organization  for  the  query  set.  The  study  of 
queries  play  an  important  role  in  database  organizations  for 
information  retrieval. 

A  query  language  provides  a  frame  work  for  information 
retrieval.     The   query   language  is  typically  a   high-level, 
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non -procedural  language.  In  that  the  user  only  telle  the 
system  what  information  is  required,  rather  than  how  it  must 
be  retrieved.  The  query  language  must  also  be  complete.  This 
implies  that  all  legitimate  data  and  data  relationships  are 
accessible  through  the  operators  defined  in  the  query 
language.  1 193 

A  query  language  is  a  generalization  of  the  predicate 
calculus  which  is  used  to  represent  statements  about  the 
relation  between  attributes  and  values,  or  between  two 
attributes  which  is  used  to  identify  a  set  (class)  of 
objects  [213.  The  central  feature  is  a  quantifier  function 
which  is  able  to  express,  in  a  simple  manner,  the 
restrictions  placed  on  a  database-retrieval  request  by  the 
user. 

The  formal  query  language  contains  three  types  of 
objects:  designators,  which  name  classes  of  objects  in  the 
database ;  propositions,  which  are  formed  from  predicates  with 
designators  as  arguments;  and  commands,  which  initiate 
actions  [21 3.  Thus,  in  order  to  obtain  the  subject  and 
sender's  name  of  a  message  with  status  'OLD',  the  following 
query  might  be  formulated. 

PRINT   Subject,  Sender 

WHERE   Status  =  ' OLD  * 
'Subject'   and   'Sender'   are   designators   to   specify   what 
information  are  needed  for  the  answer;  'Status'  and  'OLD'  are 
designators   which  are  arguments  of  the   predicate   *»* ;   and 
"PRINT"  is  a  command  for  specifying  to  the  query  analyzer  the 


form   in  vhich  the  resulting  data  should  be  presented  or   the 
manipulation  which  nust  be  performed  on  the  data  [73. 

2. 2. 2.   Query  Processing 

After  a  query  has  been  input  by  a  user,  it  is  worked  on 
by  a  query  processor.  The  flow  diagram  shown  in  Figure  2 
demonstrates  how  a  query  processor  interfaces  with  a  data 
management  system.  The  query  processor  receives  a  query  from 
the  user,  and  then  parses  and  translates  the  query.  The 
query  processor  utilizes  information  about  the  structure  of 
the  database  known  as  the  database  description.  This 
information  is  needed  so  that  the  parser  can  check  the  use  of 
attributes  of  relations,  if  a  comparison  is  made  between 
attributes,  and  whether  the  domains  are  compatible,  etc. 
The  next  step  involves  the  determination  of  a  schedule  for 
processing  the  query.  The  set  relationships  among  the  data 
items  that  are  required  for  answering  the  given  query  are 
determined  during  this  stage.  Optimization  of  the  query  may 
also  be  attempted  at  this  stage,  since  the  speed  with  which 
the  query  can  be  answered  may  depend  on  the  choice  made  by 
the  query  processor  concerning  the  sequence  of  steps  to  be 
taken  by  the  system.  The  third  step  is  the  execution  of  the 
scheduled  operations  which  involves  actually  searching  the 
database  and  retrieving  the  desired  data.  The  final  step  is 
report  generation,  which  involves  producing  the  desired 
output  format  for  the  specified  data  requested.  C 153 
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2.2.3.   Structured  Query  Language  (SQL) 

The  most  widely  used  query  language  ±s  probably  SQL  C13]. 
SQL  is  based  on  the  relational  tuple  calculus  which  is  a 
notation  for  expressing  the  definition  of  a  relation  in  terms 
of  tuples  belonging  to  some  existing  tables.  Structured 
English  Query  Language  (SEQUEL)  is  an  older  version  of  SQL 
which  was  developed  by  Chamberlain  et  al._  of  IBM  as  part  of 
the  SYSTEM  R  research  project.  Although  it  is  termed  as  a 
query  language,  SQL  permits  updates  as  well  as  data 
definition.  Its  facilities  are  summarized  in  Figure  3  [14:  p 
1103.  SQL  can  be  used  both  as  a  stand-alone  language  and 
also  as  a  data  sublanguage  embedded  in  PL/1  or  COBOL.  It  is 
available  in  many  well-known  relational  database  management 
systems  such  as  SYSTEM  R,  SQL/DS  and  DB2  £131. 

Since  data  retrieval  is  the  focus  of  interest  of  this 
report,  this  section  reviews  in  detail  only  the  query  portion 
of  SQL.  In  SQL  the  basic  query  construct  is  the  SELECT- 
FROM-WHERE  command.  This  construct  forms  the  basis  for 
retrieval. 

Suppose   that   it   is   necessary  to  access   the   name   of 

employee  number  43  from  the  table  EMPLOYEES.   The  appropriate 

command  would  be: 

SELECT    ENAME 

FROM      EMPLOYEES 

t 

WHERE      ENO  •  43 
The   SELECT   clause   specifies  the  names   of   the   fields 
(columns   or   attributes)  that  are  to  be  selected   -  in   this 
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Query  Command 

SELECT  I   Retrieve  data  from  one  or  more  tables 

Data  Manipulation  Commands 


INSERT 


DELETE 


Adds  one  or  more  rows  into 
existing  table 

Changes  data  in  one  or  more  rows 
of  a  table 

Removes  one  or  more  rows  from  a  table 


Data  Definition  Commands 
CREATE  TABLE 


DROP  TABLE 

ALTER  TABLE 

CREATE  INDEX 

DROP  INDEX 
CREATE  VIEW 
DROP  VIEW 


Defines  the  structure  of  a  new  table 
to  SOL 

Removes  the  definition  of  the  table 
from  the  system 

Adds  a  new  column  to  a  table 
definition 

Allows  a  table  to  be  indexed  on 
one  or  more  columns 

Removes  the  index  from  the  system 

Defines  a  user  view  of  part  of  the  db 

Removes  the  view  from  the  system 


Figure  3.   Query,  Manipulation, 

and  Definition  Commands  in  SQL 
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case  one,  but  there  can  be  several .  The  relation  ( table )  to 
be  used  is  listed  alter  FROM.  The  WHERE  clause  contains  a 
predicate  which  allows  logical  operators  (NOT,  AND,  OR), 
standard  comparison  operators,  IN,  ALL  and  some  other 
operators.  The  attributes  in  the  predicate  of  the  WHERE 
clause  must  be  drawn  from  the  tables  of  an  appropriate  FROM 
clause. 

It  is  possible  to  specify  just  which  columns  are   wanted. 
If  all  the  details  are  needed  an  asterisk  can  be  used : 
SELECT  « 
FROM   EMPLOYEES 
WHERE   ENO  ■  43 
This  query  would  give  the  whole  record  for  employee  43. 

Compound  selections  can  be  specified  in  the  WHERE  clause. 
More  than  one  table  can  be  used  in  the  selection.     The  first 
scheme  for  doing  this  is  to  embed  or  nest  a  SELECT-FROM-WHERE 
command  inside  the  WHERE  clause,  so  that  some  column  value(s) 
are  matched  with  values  selected  from  another  table : 
SELECT   columns  chosen  from  table  A 
FROM   records  in  table  A 
WHERE   table  A  column  ■   SELECT  table  B  column 

FROM  records  in  table  B 
WHERE  condition 
In  effect,  the  query  examines  or  extracts  from  table  B  a 
set  of  records  which  match  the  'condition'  specified  in  the 
nested  WHERE  clause.  The  value  in  some  attribute  column  of 
table  B  in  these  selected  table  B  records  is  then  matched 
with   a   corresponding  attribute  in  table  A  < "WHERE   table   A 
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column  =  ").  Records  from  table  A  are  selected  where  the 
match  occurs.  The  final  result  contains  the  'attributes 
chosen  from  this  set  of  matching  records'.  The  attribute 
chosen  for  matching  must  exist  in  both  tables.  It  might  be 
employee  number  for  instance,  or  date. 

SQL  permits  nested  blocks  to  an  arbitrary  depth  as  long 
as  the  desired  result  (the  answer)  comes  from  a  single 
relation.  If,  however,  the  result  comes  from  two  or  more 
relations,  the  subquery  strategy  does  not  work. 
Consequently,  it  is  necessary  to  join  the  relations  together. 
This  joining  method  is  the  second  scheme  of  using  two  tables 
in  a  query. 

The  WHERE  clause  links  the  two  tables  by  specifying  the 
columns  that  are  to  be  matched. 

SELECT   columns  (from  table  A  &  table  B) 

FROM   table  A,  table  B 

WHERE   column  from  A  ■  column  from  B 
This  is  an  implicit  Join  operation  followed  by  a  select.  [12] 

2.3.   SEMANTIC  NETS 

The  semantic  net,  developed  by  Quillian  (1968)  and 
others,  has  been  proposed  as  an  explicit  psychological  model 
of  human  associative  memory.  A  semantic  net  consists  of 
nodes,  which  represent  objects,  concepts,  or  events,  and 
links  between  the  nodes  which  represent  the  relations  between 
the   objects.      A   basic   set   of  primitives   is   chosen   to 


describe  objects  and  relationships,  and  all  descriptions  are 
constructed  from  these  semantic  primitives.  The  number  and 
type  of  primitives  that  form  the  basic  vocabulary  is 
important  because  the  choice  of  primitives  will  determine  the 
expressive  power  of  the  representation. 

Let  us  suppose  that  we  want  to  represent  the  fact  that 
'All  poodles  are  dogs. '  in  a  semantic  network.  We  can  do 
this  by  creating  a  simple  net  in  which  the  nodes  represent 
the  objects  and  the  link,  the  'is-a'  (or  'subset')  relation 
between  them  ( Figure  4 ) . 

If  ' Ben ji '  is  a  particular  poodle,  and  we  want  to  add  the 
fact  that  dogs  have  tails,  then  we  would  add  two  nodes  and 
two  links,  one  of  the  new  links  is  a  'has-part'  relation 
(Figure  5).  This  representation  enables  us  to  deduce  facts 
that  are  not  explicitly  stated  in  the  network,  e.g.  that 
poodles  have  tails  since  they  are  dogs,  and  so  does  Benji 
since  he  is  a  poodle.  This  feature  is  called  property 
inheritance.  Care  must  be  taken  to  separate  generic 
concepts  ( or  objects ) ,  such  as  poodles  from  a  specific  token 
such  as  Benji,  otherwise  errors  in  deductions  can  result. 
Linking  can  result  in  incorrect  deductions  if  the  generic  and 
specific  nodes  are  intermingled,  or  if  the  inheritance 
characteristics  are  not  carefully  isolated.  [93 

The  semantic  network  representation  is  not  a  formal 
mathematical  system  with  unifying  principles.  Its  'use  tends 
to  be  rather  ad  hoc,  with  various  researchers  employing 
different  net  interpretation  schemes  based  on  the  same 
general  concepts.  [213 

21 


I  DOG   I 


4" 

I   is-a 

I 
I 


I  POODLE  I 


Figure  4.  A  Simple  Semantic  Net 
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Figure  5.   A  Semantic  net  using  IS-A  link 


2.4.     SEMANTIC  NETWORKS,    RELATIONAL  DATABASES,    AND   QUERY 
LANGUAGES 

Database  management  systems  typically  incorporate  a 
database  schema.  The  schema  is  actually  expressed  in  terms 
of  a  descriptive  language  called  a  data  model.  The  data 
model  provides  a  set  of  constructs  for  describing  aspects  of 
the  database  and  is  used  by  the  database  management  system 
for  processing  queries. 

The  semantic  network  model  and  the  relational  data  model 
could  be  used  both  for  describing  the  data  and  for  specifying 
queries.  In  processing  queries  using  the  former  scheme, 
special  algorithms  can  be  employed  to  match  the  graph 
corresponding  to  the  query  with  the  graph  representing  the 
data  £21].  For  the  case  involving  relational  notation, 
most  queries  can  be  viewed  as  taking  an  entity  that  meets 
certain  criteria,  connecting  it  to  an  entity  of  another  type 
-  perhaps  through  many  relationships,  and  finally  returning 
some  attributes  of  the  resulting  entity. 

2.  5.   FRANZ  LISP 

The  language  used  for  implementing  the  QIRM  system  is  Franz 
Lisp,  a  popular  dialect  of  Lisp.  The  Lisp  language  which 
was   invented   in  the  late  1950s  has  evolved  in  a   number   of 

r 

different  directions.  Consequently,  unlike  many  other 
languages,  there  is  no  such  thing  as  standard  Lisp.  Among 
the  nost  widely  used  dialects  of  Lisp  today  are  MacLiep:  a 
version  of  Lisp  developed  at  MIT,   and   InterLisp:   developed 
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at   Bolt,   Beranek  and  Newman,   and  Xerox  Palo  Alto   Research 
Center.  [3] 

Franz  Lisp  was  developed  at  the  University  of  California 
at  Berkeley  and  is  available  under  Berkeley  Unix  [3J.  Its 
roots  are  in  a  PDP-11  Lisp  system  which  originally  came  from 
Harvard.  As  it  grew  it  adopted  features  of  MacLisp  and  Lisp 
Machine  Lisp.  Substantial  compatibility  with  other  Lisp 
dialects  (Interlisp,  UCIlisp,  CMUlisp)  is  achieved  by  means 
of  support  packages  and  compiler  switches.  The  Franz  Lisp 
system  consists  of  an  interpreter.  Lisp,  and  an  optimizing 
compiler  which  is  named  Liszt.  The  kernel  of  Franz  Lisp  is 
written  almost  entirely  in  the  progrmming  language  C,  with 
much  of  the  support  written  in  < compiled)  Lisp.  For  run-time 
efficiency,  small  portions  of  the  kernel  are  written  in 
assembly  language.  [16] 

Franz  Lisp  is  capable  of  running  large  Lisp  programs  in  a 
timesharing  environment.  It  has  facilities  for  arrays  and 
user  defined  structures,  along  with  a  user  controlled  reader 
with  character  and  word  macro  capabilities  which  gives  the 
Lisp  programmer  the  ability  to  modify  the  way  expressions  are 
read  in  by  the  interpreter.  Through  the  use  of  read  macro, 
the  user  can  designate  special  characters  which  act  in 
unusual  ways.  This  gives  the  user  the  ability  to  establish 
useful  shorthand  that  simplify  some  programming  tasks. 
Also,  Franz  Lisp  can  interact  directly  with  compiled  Lisp,  C, 
Fortran,  and  Pascal  code. 
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2.6.     RELATED  WORK 

Much   of   the   work  which  has  been   done   on   structuring 
electronic   mailbox   communication  involves   quite   different 
approaches.    Some  articles  which  are  representative  of  these 
different  methodologies  include  [23,   [53 r  and  C4  3. 
The  followings  describes  portions  of  these  works. 

2. 6. 1.  Systems  for  Controlling  Group-Communication  (SCGC)  [23 

It  ie  often  easy  to  send  a  message  to  a  large  number  of 
people  since  systems  are  often  designed  to  give  the  sender 
too  much  control  of  the  communication  process,  and  the 
receiver  too  little  control.  If  the  receiver  is  given  more 
control  over  the  communication  process,  much  of  the 
electronic  mail  which  is  not  of  interest  to  a  person  can  be 
greatly  reduced.  In  order  to  accomplish  this,  a  structure 
must  be  imposed  on  the  set  of  messages.  Electronic  mail 
systems  thus  need  to  be  more  database  oriented,  like  some  of 
the  existing  computer  conference  systems.  Even  though  group- 
communication  system  can  cause  information-overload  problems, 
it  provides  people  an  environment  for  meeting  and  exchanging 
ideas  much  more  freely  than  in  a  pure  mail  system.  There 
exist  several  design  options  for  electronic  message  systems 
to  overcome  the  overload  problems. 

When  a  conference  system  complements  a  message  system, 
the  flow  of  unwanted  messages  is  greatly  reduced.  Instead  of 
delivering  an  unordered  heap  of  messages,  the  system  can 
deliver   a  neatly  structured  database  of   incoming   messages. 
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It  permits  the  reader  to  decide  which  messages  to  read 
immediately,  which  to  save  for  another  time,  and  which  not  to 
read  at  all. 

Another  way  of  structuring  messages  is  via  comment  trees. 
A  system  can  be  designed  to  store  relations  between  messages, 
where  one  of  them  can  be  a  comment  or  a  reply  to  another 
message.  Thus,  a  set  of  messages  which  refer  to  each  other 
directly  or  indirectly  (comment  tree)  can  be  identified 
automatically  by  the  system. 

Also,  the  storage  and  retrieval  of  messages  can  be 
controlled  by  such  items  as:  keywords,  subject,  and  author. 
The  system  can  be  told  to  deliver  messages  according  to  user- 
specifications  involving  these  items,  thus  giving  the  reader 
more  control  of  what  messages  will  be  read.  For  example,  a 
user  can  read  all  messages  on  a  certain  subject  before 
continuing  with  a  new  subject. 

A  designated  human  leader  for  a  computerized  conference 
can  help  a  group  to  control  its  communication.  A  leader's 
role  includes  editing  a  list  of  items  or  keywords  for 
clarity,  or  deleting  or  moving  inappropriate  items  before 
posting  them  (control  by  selection).  The  process  of 
summarizing  discussions  can  also  be  performed  by  human 
digesters.  People  in  such  roles  abstract  the  discussions  in 
voluminous  open  conferences  into  write-protected  conferences 
containing  only  the  abstracts  (control  by  abstract  writing). 
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2.6.2.   A  System  for  Managing  Structured  Messages  ( SMSM )  [5] 

Message  systems  transport  the  messages,  but  do  not  manage 
them.  Database  management  systems  manage  the  information, 
but  do  not  have  any  notion  of  addresses.  Integration  of  the 
facilities  of  both  systems  provides  a  scheme  for  structuring 
mailbox  communication. 

A  system  using  such  an  approach  manages  messages  as  typed 

objects   which   can   be   stored  within   a   logical   unit   and 

transferred   between  the  units.   Such  a  system   provides   the 

manual   functions   which   enables  users   to   find   and   query 

messages  by  selecting  a  message  type  and  partially  specifying 

the  contents  of  the  messages  in  templates.  A  user  can  specify 

message   selection   based   on  various  combinations   of   items 

internal  to  the  messages. 

Also,   the  system  permits  the  specification  of   procedures 

which   are   triggered  by  the  presence  of  messages   and   which 

automatically    manipulate   the   messages.       The   automatic 

procedures  are  specified  by  giving  the  system  some  indication 

of  the  pattern  or  contents  of  the  messages  which  are   desired 

and   an   indication  of  what  the  system  should  do   with   these 

messages.   Automatic  procedures  run  regardless  of  whether  the 

user   who   specified   the  procedure  is  currently   logged   in. 

Examples  of  these  automatic  functions   include:   coordination 

of   messages,   i.e.,   act  only  when  a  related  set  of  messages 

t 

has  been  assembled;  modification  and  creation  of  messages; 
filing  messages;  and  forwarding  received  messages  to  other 
stations  according  to  their  contents. 
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A  uniform  user  interface  which  is  based  on  ' specification 
by  example*  can  be  provided  to  carry  out  all  the  manual  and 
automatic  functions.  Users  query  messages  by  partially 
specify  the  contents  of  the  messages  in  the  template.  The 
automatic  procedures  are  also  specified  by  indicating  what 
messages  are  to  be  collected,  and  what  is  to  be  done  with 
them. 

2. 6. 3.   Intelligent  Information -Sharing  Systems  [43 

The  problem  of  balancing  the  value  of  open   communication 

channels   with   the   cost  of  information   overload   has   been 

expressed   by  many  users  of  group -communication  systems.      A 

technology   that  can  increase  the  selectivity  with  which   the 

information  is  disseminated  should  be  sought.      A   prototype 

called   Information  Lens  (ID  has  been  developed   using   this 

concept.       This   system   employs  user-interface  design   and 

techniques   from   Artificial   Intelligence   such   as   frames, 

production  rules,   and  inheritance.      These  techniques   help 

people   filter,    sort,   and   prioritize   messages   that   are 

addressed   to   them.    They   also  help  users  to   find   useful 

message  they  would  not  otherwise  have  received,  via  a  special 

mailbox  called  *  Anyone '  .      Messages  that  have  *  Anyone '  as  an 

addressee   are  automatically  delivered  to  a   public   mailbox. 

Receivers   can   have   interest  profiles   which   automatically 

i 

retrieve  messages  from  the  public  mailbox  'Anyone'. 
Also,  this  system  permits  semi -structured  templates  to  be 
used  by  senders  in  message  composition.     These  templates  can 
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also   be  utilized  by  recipients  to  facilitate  construction  of 
a  set  of  rules  for  filtering  and  categorizing  messages. 

Three  different  approaches  to  automated  message  filtering 
are  employed  in  the  Information  Lens  system.  A  cognitive 
filtering  approach  works  by  characterizing  the  contents  of  a 
message  and  the  information  needs  of  potential  message 
recipients  and  then  using  these  representations  to 
intelligently  match  messages  to  receivers.  Decisions  are 
based  on  distribution  lists,  and  either  a  simple  keyword 
search  or  combinations  of  various  conditions  on  fields.  A 
social  filtering  approach  works  by  supporting  the  personal 
and  organizational  interrelationships  of  individuals  in  a 
community.  It  complements  the  cognitive  approach  by  focusing 
on  the  characteristics  of  a  message's  sender,  in  addition  to 
its  topic.  An  economic  filtering  approach  relies  on  various 
kinds  of  cost-benefit  assessments  and  explicit  or  implicit 
pricing  mechanisms.  The  length  of  a  message,  the  number  of 
recipients,  and  the  salary  of  a  recipient  are  some  of  the 
factors  used  to  estimate  its  cost.  The  current  version  of 
this  system  emphasizes  the  cognitive  approach. 

The  Information  Lens  system  is  written  in  the  Interlisp-D 
programming  environment  using  Loops,  and  runs  on  Xerox  1108 
and  1109  processors  connected  by  an  Ethernet.  It  is  built 
on  top  of  an  existing  electronic  mail  system.  jjsers  can 
continue  to  send  and  receive  their  mail  as  usual,  and  have 
the  option  of  using  centrally  maintained  distribution  lists 
and  manually  classifying  messages  into  folders.  The  system 
additionally     provides    following     important     optional 


capabilities:  (1)  Structured  message  templates  are  available 
for  message  composition;  (2)  Senders  can  include  a  special 
mailbox  named  "Anyone"  (which  is  a  public  information  file 
as)  an  addressee  of  a  message;  (3)  Receivers  can  specify 
rules  to  automatically  filter  and  classify  messages  arriving 
in  their  mailbox  or  the  "Anyone"  mailbox.  Rules  can  move 
messages  to  folders,  delete  messages,  set  "characteristics" 
of  messages  based  on  other  field  values,  or  select  messages 
addressed  to  "Anyone". 

2.6.4.   Comparison  of  QIRM  with  related  work 

In  the  articles  121,  [53,  and  C43,  the  authors  presented 
their  ideas  on  desirable  design  options  and  implementation 
strategies  for  structured  mailbox  systems.  Some 
considerations  in  formulating  these  strategies  includes: 
information  overload  [2,  43,  information  sharing  [2,  5,  4], 
communication  filtering,  sorting,  and  prioritizing  [2,  5,  4], 
the  query  service  C5,  4],  and  intelligent  database  management 
systems  t 5,  43. 

The  features  presented  in  £2],  15],  [41,  and  the  QIRM 
could  be  compared  in  many  aspects. 

The  scope  of  QIRM  is  different  from  that  of  the  systems 
illustrated  in  [23  ( SCGC ) ,  [53  < SMSM ) ,  and  [43  (ID.  QIRM  is 
a  information-retrieval  system  which  is  intended  to  access 
messages  in  a  user's  mailbox.  SCGC,  SMSM,  and  IL  are  group- 
communication-oriented  mailbox  systems  for  sharing 
information.      In  the  environments  of  SCGC,   SMSM,   and   IL, 
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users  can  send,  receive,  delete,  file  messages  as  well  as 
retrieve  them.  The  recipient  using  SCGC,  SMSM,  and  IL  can 
query  both  global  and  local  messages.  In  the  QIRM,  hovever, 
a  mail-receiver  can  query  only  messages  that  have  arrived  in 
his  own  mailbox. 

The  QIRM  differs  from  SMSM  and  IL  in  the  methods  of 
specifying  the  queries.  While  the  queries  of  SMSM  and  IL  are 
based  on  the  use  of  semi -structured  message  templates  <a 
domain-  calculus  query),  OIRM's  query  is  specified  in  the  way 
similar  to  that  of  SOL  <a  tuple-calculus  query).  It  should 
be  mentioned  that  IL  provides  very  friendly  and  convenient 
user-interfaces.  Messages  in  IL  are  composed  with  a 
display-oriented  editor  and  templates  that  have  pop-up  menus 
associated  with  the  template  fields. 

SCGC,  SMSM  and  IL  provide  both  automatic  and  manual 
facilities  for  structuring  messages,  while  QIRM  has  a  manual 
function  that  categorizes  messages  temporarily  according  to 
the  use]  query.  Structuring  on  the  message  set  is  based  on 
information  input  by:  the  sender (SMSM,  SCGC,  IL),  someone 
else  <e.g.  a  leader)  (SCGC),  the  recipeint  (SMSM,  SCGC,  IL, 
QIRM),  or  automatically  the  system  (SMSM,  SCGC,  IL).  SMSM 
and  IL  allow  a  recipient  to  specify  rules  for  processing 
messages.     These  rules  are  composed  using  the  same  templates 

as   those   used   for  composing  and   quering   messages.      The 

i 

facility  for  user-specified  rules  is  not  employed  in  QIRM  nor 
SCGC. 

Both   QIRM  and  IL  use  several  techniques  from   artificial 


intelligence.  To  structure  messages,  the  latter  employes 
frames,  each  of  which  contains  messages  with  similar 
contents.  These  frames  are  arranged  into  a  network  using  the 
frame -inheritance  lattice.  The  messages  in  QIRM  are 
structured  in  a  semantic  network  that  consists  of  many 
subsets,  and  each  subset  represents  a  message. 

Even  though  the  detailed  architecture  and  techniques 
employed  in  these  systems  are  different  each  other,  the  basic 
key  ideas  are  found  to  be  very  similar :  Messages  can  be 
controlled  and  selected  on  the  content  of  messages  (e.g., 
author,  topic,  keyword )  by  a  user ' s  message -specification  or 
by  the  system  function  which  automatically  manipulate  the 
messages;  In  order  to  achieve  such  a  structured 
communucation,  it  is  desirable  to  develop  more  database- 
oriented   and  active  mailbox  systems. 


CHAPTER  3 


DESIGN 


3.  1.   OBJECTIVES 


The  objective  of  this  work  is  to  design  and  implement  a 
system  vhich  allows  a  user  to  categorize  requests  for 
messages  in  his/her  mailbox  and  to  retrieve  the  information 
according  to  the  fields  and  conditions  specified  by  the  user. 
By  employing  concepts  and  techniques  of  artificial 
intelligence,  the  system  can  provide  some  insights  for 
developing  an  intelligent  user- friendly  UNIX  mail  facility. 
Moreover,  the  project  has  the  potential  of  reducing  many  of 
the  burdens  and  problems  that  current  users  of  Unix  mall  have 
to  encounter  during  their  use  of  this  facility. 

3.2.   THE  SYSTEM  ENVIRONMENT 

The  prototype  which  is  referred  to  as  'The  OIRM  System' 
has  been  developed  on  a  Digital  Equipment  Corporation  VAX 
11/780  minicomputer  supported  by  the  Berkeley  4. 3  BSD  UNIX 
operating  system  at  Kansas  State  University  and  written  in 
Franz  Lisp  Opus  42. 
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3. 3.    SYSTEH  DESIGN 


3. 3. 1.   Overview 


The  diagram  shown  in  Figure  6  presents  the  framework  of 
the  01 RM  system.  As  can  he  seen,  the  system  consists  of  four 
major  components.  The  following  is  a  description  of  each  of 
these  basic,  components. 

(1).  DATABASE  PREPARATION  :  Database  preparation  involves 
the  design  of  how  the  data  are  organized  to  be  used  and  the 
subsequent  loading  of  data  into  the  database  from  the  input 
file(s) . 

(2).  USER  INTERFACE  :  The  user  interacts  with  the  system 
through  a  query  language.  The  entire  retrieval  design 
process  is  accomplished  through  this  interface. 

<  3  )  .  QUERY  PROCESSOR  :  This  processor  processes  a  query 
submitted  by  a  user.  The  query  is  parsed,  and  the  database 
is  searched  for  items  that  match  the  specific  request. 
Information  about  the  organization  of  database  is 
incorporated  within  this  processor  itself. 

(4).  OUTPUT  :  The  output,  which  is  the  result  of  the 
searching  operation  performed  by  the  query  processor,  is 
selectively  generated  according  to  the  user's  query- 
specification. 

Processing   by   the  QIRM  system  can  be   broadly   divided 

t 
into  four  phases.     In  phase  one  the  database  is  constructed ; 

the   data  read  from  the  data  file<s)  are  converted   into   the 

desired   Lisp  structure.     The  next  phase  is  query  processing 

which  involves  scanning,  and  parsing  the  query  submitted  by  a 
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user.  The  -third  phase  involves  searching  the  database  and 
retrieving  the  requested  data.  The  final  phase  is  the  output 
generation.  In  this  phase,  the  requested  information  is 
displayed  on  the  terminal  in  the  format  specified  by  the 
user. 

Having  provided  a  basic  background  of  the  OIRM  system, 
in  the  following  text  each  component  is  described  in 
considerable  detail. 
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Framework  of  the  QIRM  SYSTEM 


3.3.2.   Database  Preparation 

3.3.2.1.   The  Lisp  Notation  and  Indexing  Scheme 

The  database  in  this  system  is  based  on  the  semantic 
network  formalism  (See  section  2.3.).  The  semantic  networks 
are  augumented  with  an  indexing  scheme. 

As  expressed  in  Section  2.1.1.,  the  contents  of  a  message 
in  a  Unix  system  mailbox  consist  of  a  header  with  various 
fields  and  a  text  body.  This  structure  can  be  used  for 
storing  mailbox  information  into  the  database.  Figure  7 
presents  a  graphical  description  of  the  information  that  is 
used  to  structure  a  database  in  QIRH.  Each  message  has 
seven  properties  which  are  'sdate,  rdate,  from,  to,  subject, 
status,  body".  A  brief  description  of  each  field  is  now 
given : 

'sdate'   indicates   the  date  of  sending   a   message; 

'rdate'   indicates  the  date  of  receiving  a  message; 

'from'   indicates   the  message-sender's   id-name; 

'to'  indicates  the  message-recipient's  id-name; 

'subject'  indicates  the  subject  of  a   message; 

'status'  indicates  the  status  of  a  message  such  as  'Has   this 

message  been  read  or  not?'  or   'Is  this  message  old  or  new?'; 

'body'  indicates  the  body  of  a  message. 

Bodyl  is  a  node  with  two  links,  one  to  a  node  containing 

the  text  portion  of  body,   and  the  other  to  a  node  containing 

control   information  about  a  message,   namely,   the  number  of 

lines   of  text  and  the  address  of  text  in  the   mailbox   file. 

Rdatel   is  a  node  having  four  properties  which  are   'rd,    rw. 
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Figure  7.    A  semantic  network  showing  the  structure  of 
the  database  in  the  QIRM  system 


rm,  ry ' .  The  property  links  ' rd,  rw,  rm,  ry '  Indicate  the 
day  number ,  the  weekday,  the  month,  and  the  year  of  receiving 
a  message,  respectively.  Sdatel  is  a  node  containing  four 
properties  which  are  'sd,  sw,  sm,  sy ' .  The  property  links 
'sd,  sw,  sm,  sy '  indicate  the  day  number,  the  weekday,  the 
month,  the  year  of  sending  a  message,  respectively. 

The  format  of  the  semantic  network  shown  in  Figure  7 
suggests  that  the  information  about  each  of  the  individual 
messages,  like  msgl,  is  clustered  in  a  particular  place.  Any 
fact  that  is  associated  with  msgl  is  represented  with  an 
arrow  going  in  and  out  of  the  msgl  node.  Therefore,  having 
located  msgl,  it  is  possible  to  gain  access  to  all  the 
information  about  it.  This  is  an  indexing  scheme.  If  a 
certain  process  in  program  requires  information  about  the 
subject  of  msgl,  it  would  not  be  practical  to  linearly  search 
all  the  nodes  in  the  database  to  find  the  fact.  It  is  much 
more  plausible  to  have  msgl  point  to  the  information 
directly. 

There  are  a  variety  of  techniques  that  have  been 
developed  for  indexing  patterns  in  a  database.  The  technique 
employed  in  this  work  takes  advantage  of  Lisp  property  lists 
in  breaking  up  a  large  database  into  several  small  ones.  As 
shown  in  Figure  7,  the  database  'MAIL'  is  composed  of  many 
records  such  as  'msgl,  msg2,  .  ..,  msgn ' .  In  order  for  a 
database  to  support  several  records  at  the  same  time,  there 
must  be  an  index  which  keeps  track  of  the  records.  The 
capitalized  record  name  (i.e.  if  a  record  name  is  'msgl', 
' MSG1 *  becomes  its  capitalized  name. )  can  be  used  as  the  name 
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of  the  property  to  index  a  record  under.  Thus,  when  a 
record  is  added  to  the  database,  it  is  stored  on  a 
' capitalized-record-name '  property  list  of  'MAIL*.  When  an 
item  is  added  to  a  record  in  the  database,  the  system 
identifies  the  record  (a  message-name)  the  item  belongs  to 
and  stores  it  on  a  list  under  some  property  name  of  the 
record  (a  message).  For  example,  suppose  the  system  wants 
to  add  an  information  such  as  'The  message  subject  is  TUITION 
FEES. '  to  a  record,  say  msgi ,  using  this  scheme.  It  can  be 
done  by  adding  this  item  to  the  list  stored  under  the 
'subject'  property  of  a  record  'msgl'  which  is  stored  under 
the  'MSGI '  property  of  the  database  'MAIL' .  When  fetching 
an  information  like  the  subject  of  messagel  from  the 
database,  the  system  first  obtains  the  list  under  'MSGI ' 
property  of  'MAIL'.  Then,  it  gets  the  value  under 
'subject'  property  in  this  list. 

From  the  above  discussion,  we  can  conclude  that: 
Semantic  networks  suggest  a  scheme  of  forward  and  backward 
pointers  that  appears  to  make  accessing  information  very 
easy.  Figure  8  shows  how  the  attribute-value  memory 
structures  in  Figure  7   are  represented  in  Lisp. 
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3. 3. 2. 2.   Database  Loading 

After  the  data  structure  is  defined,  the  database  is 
loaded  by  the  database  manager ;  this  manager  consists  of 
several  procedures.  The  database  loading  process  involves 
determining  the  source  of  the  data,  reading  the  data  file, 
constructing  the  database,  and  updating  database  file.  The 
question  of  which  data  file(s)  should  be  loaded  is  critical 
in  terms  of  efficiency. 

As  shown  in  Figure  9,  the  QIRM  system  has  been  designed 
to  take  one  of  two  paths  in  determining  which  input  file  to 
read.  Which  path  is  chosen  by  the  database  manager  depends 
upon  the  existence  of  an  old  mail  database  file  ( referred  to 
as  'old-db').  The  old  mail  database  file  is  arrived  at  by 
concatenating  the  user ' s  login -id  and  the  string  ' data '  .  For 
example,  if  ' songhee  *  is  the  user  *  s  login -id,  then  ' old-db ' 
file  is  named  'songheedata ' .  That  is,  the  database  in 
' songheedata '  is  the  one  that  had  been  constructed  when  QIRM 
was  called  the  last  time  by  a  user,   'songhee' . 

When  a  user  Invokes  the  QIRM  system,  the  database  manager 
checks  whether  an  'old-db'  exists  or  not.  If  there  is  no 
'old-db '  found,  the  database  manager  considers  that  it  is  the 
first  time  that  the  user  has  employed  the  QIRM  system.  In 
this  case,  the  manager  takes  the  user ' s  mailbox  as  the  only 
data  file  and  loads  the  data  from  the  mailbox  into  a  database 
(  'path  two'  in  Figure  9).  However,  if  'old-db'  is  found, 
the  manager  uses  both  an  *old-db*  and  a  mailbox  as  data  files 
to   reload   data   into  a   database.     The   reloading   process 


includes  tfte  reading  of  the  two  data  files  to  compare  a  key 
portion  of  each  message  in  one  file  with  that  in  another 
file.  This  comparison  is  needed  for  the  manager  to  identify 
the  messages  deleted  from,  or  added  to  the  most  recent 
mailbox  ( 'path  one'  in  Figure  9) .  Using  the  algorithm 
( algorithm  A )  described  above,  the  database  manager 
constructs  a  database  end  stores  it  into  ' old-db ' ,  which  will 
be  used  as  one  of  the  input  files  for  the  next  usage. 
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Figure  9.   Algorithm  A  for  loading  a  database 
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3. 3. 2. 3.    Efficiency  Consideration 

For  the  purpose  of  efficient  database  loading  process, 
two  algorithms  (A  and  B)  are  compared  by  considering  the  time 
complexity.  Algorithm  A  was  discussed  in  the  previous 
section  and  shown  in  Figure  9,  while  algorithm  B  is 
illustrated  in  Figure  10. 

In  this  argorithm  B,  a  user ' s  mailbox  is  taken  as  the 
only  data  file.  Whenever  the  system  is  invoked,  a  database 
manager  reads  through  a  mailbox  and  constructs  a  database. 
No  physical  database  file  exists  in  this  algorithm. 

Algorithm  A  uses  several  procedures  to  minimize  the 
regular  user's  waiting  time  for  database -loading,  by 
distinguishing  the  first-time  user  from  the  regular  user  of 
the  system.  In  order  for  the  text  data  in  a  mailbox  to  be 
loaded  into  a  database,  the  data  should  be  organized  and 
converted  to  the  desired  format.  But,  such  processes  are 
not  required  for  loading  the  data  from  'old-db'  which  have 
been  stored  in  the  lisp  format  consistent  with  the  database 
structure.  Therefore,  once  the  database  manager  finds  that 
the  key  portion  of  a  message,  say  msgN,  in  'old-db'  is  the 
same  as  that  in  mailbox,  the  whole  information  about  msgN  in 
'old-db'  can  be  reloaded  into  a  database  quickly  and  easily. 
Algorithm  A  takes  extra  time  for  updating  'old-db'  and 
determining  the  identity  of  messages  in  two  data  fl'les.  In 
order  to  minimize  the  time  for  the  latter,  this  algorithm 
uses  very  small  key-portions  of  messages  for  the  comparison. 
The   performance   of   each   algorithm   was   evaluated   by 


running  it  with  a  set  of  sample  data  which  were  saved  into  a 
mailbox  'songhee'  from  the  14th  of  Jan  1987  to  the  20th  of 
July  1987,  and  by  measuring  the  user  time  which  is  needed  to 
load  the  whole  data  from  the  data  file(s)  into  a  database. 
The  data  set  includes  various  messages  such  as  the  local 
messages  <  sent -received  on  ksuvaxl ),  UUCP  messages,  and  the 
messages  from  CSNET,  BITNET,  KSUVM,  JUNET  via  CSNET,  ARPANET, 
and  USENET.  In  each  comparison,  the  key  portion  <  i. e.  the 
first  line  of  a  message  header  containing  a  sender's  id  and 
the  date  of  receiving  the  message >  of  the  last  five  messages 
were  modified  for  algorithm  A,  so  that  the  system  recognizes 
them  as  new  messages. 
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Figure  10.   Algorithm  B  for  loading  a  database 


4  6 


For  the  purpose  of  predicting  the  loading  times  for  large 
sets  of  messages,  the  predicted  loading  times,  T  <  second ) ,  for 
each  algorithm  were  formulated  by  utilizing  linear  regression 
method.  Minimization  of  the  sum  of  squares  of  the 
deviations  between  the  measured  times  and  the  linear 
expression  yields : 

T  =   1.72  *  0.33N        Algorithm  A 

T  =  -3.40  *  0.39N        Algorithm  B 

where  N  denotes  the  number  of  messages.  The  correlation 
coefficients  for  algorithm  A  and  B  were  0. 997  and  0. 999, 
respectively,  which  represent  good  linearities  of  the 
measured  loading  times  in  both  cases. 

As  shown  in  Figure  11,  algorithm  A  seems  to  be  less 
efficient  than  algorithm  B  for  the  small  number  (under  50)  of 
messages,  but  the  difference  is  negligible.  The  time  used  for 
database  loading  in  both  algorithms  are  almost  identical  for 
fifty  to  one  hundred  messages.  For  a  large  set  of  data  <  more 
than  100  messages ) ,  however,  algorithm  A  becomes  more 
efficient  than  algorithm  B.  The  larger  the  data  set,  the 
more  loading  time  is  saved  by  using  algorithm  A.  For 
example,  a  user  with  360  messages  can  save  more  than  15%  of 
loading  time  which  is  required  by  algorithm  B. 

When  we  consider  the  complexity  in  terms  of  space, 
algorithm  B  is  superior  to  algorithm  A.  The  space  for  'old- 
db'  is  not  needed  in  algorithm  B.  Also,  the  source  code  for 
algorithm  B  is  shorter  than  that  of  algorithm  A  by  2937 
bytes.  However,  for  the  QIRM  with  a  heavy  and  direct 
user-interface,   it   is  obvious  that  the  time  efficiency  is  a 
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much  more  critical  factor  "than  space  efficiency  in  choosing 
an  algorithm.  For  this  reason,  algoritm  A  has  been 
chosen  for  loading  a  database  in  QIRM. 
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Figure    11.    Comparison  of   loading  time  of  algorithm  A  with  that  of 
algorithm  B. 


3. 3. 3.    Queries  -  User  Input  and  Output 

As  mentioned  in  chapter  1,  the  query  language  is  very 
useful  for  retrieving  information  from  data  bases.  Even 
though  the  query  language  is  very  different  from  Lisp,  it  is 
convenient  to  describe  the  query  language  in  terms  of  the 
general  framework  of  Lisp.  It  is  described  as  a  collection 
of  primitive  elements,  together  with  means  of  combination 
that  enable  a  user  to  combine  simple  elements  to  create  more 
complex  elements  and  provides  a  means  of  abstraction  that 
enables  users  to  regard  complex  elements  as  single  conceptual 
units.  The  mail -query  language  implemented  in  this  project 
has  been  designed  taking  advantage  of  above  aspect  in  Lisp. 

In  order  to  illustrate  the  features  of  the  query  system 
in  the  QIRM,  this  section  shows  how  QIRM  can  be  used  to 
manage  the  database  which  ie  built  from  the  information  in  a 
mailbox.  The  language  provides  pattern -directed  access  to 
the  information. 

3.3.3.1.   Simple  Queries 

The  mail-query  language  allows  users  to  retrieve 
information  from  the  database  by  posing  queries  in  response 
to  the  system's  SELECT-WHERE  prompts. 

The  syntax  of  simple  query  is 
SELECT   (  <f ield>*   I   «  ) 
WHERE    <  <query-pattern>  )  I  ( ) 
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Some  points  concerning  this  query  are : 

1.     a.     <field>*  indicates  the  set  of  length  one  or  more 

which  consists  of  elements  of  the  form  <field> 

where 

<field>,   in  turn,   is  made  up  of  sdate,   rdate, 

from,  to,  subject,  status,  and  body. 

b.  '  I  '   indicates  alteration,   for  example,    ' A  I B  * 
means  a  choice  of  A  and  B. 

c.  '•'  is  the  shorthand  of  all  fields. 

2, 

2.1.  <query-pattern>  has  the  following  structure. 

<predicate-operator>  <propertyname>  <propertyvalue> 

where 

a.  <predicate-operator>  can  be  <,   >,    <=,   >=,  or  =. 

b.  <propertyname>  can  consist  of  sw,  sd,  sm,  sy, 
rw,  rd,  rm,  ry,  from,  to,  subject,  status,  and 
text. 

c.  1.   <propertyvalue>   can   be   the   value   to  be 

searched    for   in   the   named   property    of 
messages. 
2.   If  the  property  value  consists  of  more   than 
one   word,     it   should   be    parenthesized, 
otherwise,  it  should  not.   A  word  indicates 
a  sequence  of  any  characters  except!  blank. 
2.  2.      (  )     indicates     that      'no     condition'     is 
specified  for  retrieving  messages. 
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The  input  query  specifies  that  one  is  looking  for  entries 
in  the  database  that  match  a  certain  pattern.  Using  the 
matching  operation  that  will  be  described  in  section 
3.3.4.1.,  the  query  determines  whether  the  desired  pattern  is 
in  the  database  and  which  records  of  the  database  contain  it. 
The  system  responds  to  a  simple  query  by  showing  the  values 
of  requested  fields  in  all  records  found  which  meet  the 
criteria  specified  by  the  pattern. 

A  query  *  s  response  involves  the  output  of  data. 
Sometimes  the  amount  of  data  is  very  large,  and  it  may  be 
unexpected  by  the  user.  In  such  a  case,  it  is  convenient  to 
tell  the  user  the  number  of  messages  retrieved  and  output 
only  the  first  items,  and  then  inform  the  user  that  there  are 
more  data  which  can  be  supplied.  The  system  gives  the  number 
of  additional  reponse  records  and  interrogates  the  user  about 
their  disposition. 

For  example,  to  see  all  subjects  of  mail  items  which  were 
received  on  Wednesday,  one  can  say 

SELECT   (subject) 

WHERE    (=   rw   wed). 

The   ClIRM  system  does  not  distinguish  between   upper   and 

lower  case  letters.    Thus  the  above  query  could  equally  well 

have  been  entered  as 

.* 

SELECT   <  SUBJECT ) 

WHERE     <=  rw  WED). 
The  system  would  display  the  subjects  of  all  the  records  on  a 
screen   having  the  structures  which  satisfy  the  condition   of 
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'WHERE*  clause.  Figure  12  shows  an  example  of  how  the 
output  is  displayed  at  the  terminal  and  how  the  system 
interacts  with  a  user. 


*  *   3  message  recalled  1    *  * 
msg4 

subject :   new  arrival 

2  messages  lei t !     More?  <  y /n )  n 

SELECT   (subject) 
WHERE    (>   rw   wed) 

*  *   1  message  recalled  I        •* 
msgl 

subject  :   tuition  fees 

SELECT   stop 

**   bye  bye   ** 
X 


Figure  12.    Output  Examples  of  OIRM 


During  retrieval  of  information,  the  user  may  abort  the 
display  of  the  remaining  messages,  and  start  on  a  new  query. 
That  is,  by  choosing  'n'  as  the  answer  to  the  system's  'More? 
(y/n)'  question,  the  user  can  start  a  new  dialogue.  The 
user  can  either  continue  retrieving  information  by  specifying 
hie  choice  of  fields  and  conditions  in  response  to  the 
system's  new  SELECT-WHERE  prompts  or  exit  the  system  by 
SELECTing  'stop'  command. 

Another  example  is  : 
SELECT  (sdate  body) 
WHERE   <=  subject   (A.C.M.  meeting)  ) 

The  system  would  respond  with  the  selected  items  from 
messages  which  meet  the  conditions  specified  by  the  'WHERE' 
clause. 

In  this  example,  one  of  the  output  might  be  : 

■   msglO 

Date:  Wed,  14,  Jan  87  08:45:04  est 

<  ...   the  message  body  of  A.C.M.  meeting   ...   >  " 

If  one  wants  to  retrieve  all  the  values  of  seven  fields 
of  messages,  an  asterisk  can  be  used: 

SELECT   <•) 

WHERE    (=   subject  meeting) 
would   give   all   of  fields  of  each  mail  item   whose   subject 
contains  the  word  'meeting'.    Notice  that  a  user  can  specify 
either    the  exact  subject  of  a  message  or  a  keyword   of   the 
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subject. 

Thus,   the   simplest  form  of  query  which  will  display  all 
of  the   messages  in  the  database  in  their  entirety  is: 
SELECT   < . ) 
WHERE    ( ) 


3. 3. 3. 2.   Compound  Queries 

Simple  queries  form  the  primitive  operations  of  the  mafl- 
query  language.  In  order  to  form  compound  operation,  this 
query  language  provides  a  means  of  combination  which  mirror 
the  formulation  of  logical  expressions.  Here,  logical 
connectives,  'and'  'or'  and  'not',  could  be  considered  as 
operations  built  into  the  query  language. 

The  syntax  of  a  compound  query  with  'and'  is 

SELECT   (  <field>»  )  I    <•) 

WHERE    < and   <compound-query-pattern>+ ♦  > 

Some  points  concerning  this  query  are: 

1.  <compound-query-pattern>*»  indicates  the  set  consisting  of 
two  or  more  <cotnpound-query-pattern>s. 

2.  <compound-query-pattern>  can  be  any  of  the  following: 
(and    <compound-query-pattern>+ +   > 

(or    <compound-query-pattern>+ ♦   ) 
(not  (  <query-pattern>  )  ) 
(  <query-pattern>  ) 
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The  compound  query  with  *  and  *  connective  ie  satisfied  by  all 
sets  of  values  for  the  property  names  that  simultaneously 
satisfy  all  of   <compou nd - query- pa ttern>s. 

The    following   is   a   compound   query    example    which 
illustrates  the  use  of  'and': 
SELECT   <  subject ) 

WHERE   (and  <  >=  sm  feb  )   (  =  to  grads )  ) 
As   the   response  of  this  query,   the  system   shows   all   the 
subjects  of  messages  which  were  sent  to  the  graduate  students 
from  Feburary  to  December. 

Another  means  of  constructing  compound  queries  is  through 
'or'.    The  syntax  of  this  query  is: 

SELECT   (   <field>+  )    I    <•> 

WHERE    (or   <compound-query-pattern>* +  ) 

This  query  is  satisfied  by  all  sets  of  values  for  the 
property  names  that  satisfy  at  least  one  of  <compound-query- 
pattern>s. 

An  example  of  a  compound  query  using  ' or '  is ; 
SELECT   (body) 

WHERE     (or  (=  text  (unix  system))   (=   text    (mail 
box) ) ) 
The   result   of   this  query  is  all  the  message   bodies   which 
contain  either  the  word  'unix  system'  or  'mail  box'. 
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Queries  can  also  be  -formed  with  'not'. 

SELECT   (  <field>+  )    I    (*) 

WHERE    (  not  (  <query -pattern>  )  ) 
is   satisfied  by  all  values  of  the  property  name  that  do   not 
satisfy  query -pattern. 

A  query  example  formed  with  ' not '  is  : 
SELECT   (from) 
WHERE    <  not  ( =  sm   jan  )  ) 
The   result   of   this  query  is  senders'  id-names  of   all   the 
messages  which  were  not  sent  in  January. 

Also,  one  can  combine  ' and ' ,  ' or ' ,  and  ' not '  to  specify 
conditions  in  a  WHERE  clause  as  shown  in  the  following 
example : 

SELECT   <  subject   body ) 

WHERE     (or   (and   (=  sd  20 >   ( not ( =  sm  jan))) 
( =  status  r ) ) 

As  the  result  of  this  query,  the  system  shows  a  user  the 
subjects  and  text  bodies  of  all  the  messages  which  either 
were  sent  on  the  20th  of  any  month  except  January  or  have 
already  been  read  by  the  user. 


3. 3. 4.   The  Query  Processor 

In  this  section,  an  overview  of  the  query  processor ' s 
general  structure  is  presented. 

The  query  processor  is  organized  around  a  central 
operation  called  network-fragment  matching.  This  section 
begins  by  discussing  network-fragment  matching  and  how  it 
permits  both  simple  and  compound  queries  to  be  implemented. 
This  section  also  shows  how  the  entire  query  interpreter 
works  by  utilizing  a  function  which  classifies  expressions. 

3. 3. 4. 1.   Matching 

The  mechanism  used  by  this  processor  is  based  on  matching 
semantic  network  structures:  a  fragment  of  a  semantic  net  is 
structured  to  represent  an  object  ( a  query-pattern  which  is 
sought).  This  fragment  is  matched  against  the  database  for 
the  semantic  net  to  see  if  such  an  object  exists.  Once 
having  found  the  object,  variable  nodes  in  the  fragment  are 
bound  to  the  values  which  they  must  possess  in  order  to  make 
a  match  perfect. 

Suppose  we  wish  to  retrieve  the  information  based  on  the 
following  request  indicating  'Show  all  subjects  of  messages 
whose  status  is  'New': 

SELECT   <  subject  ) 
WHERE     <  =  status  N  ) 
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We  construct  the  fragment  as  shown  in  Figure  13. 


I    msg?      I >  I   N    I 

• •*■  » + 

I 
I 
I  subject 

I 


I 


Figure  13.     A  Fragment  of  a  Semantic  Nets 


This  fragment  is  then  matched  against  the  database  in  a 
search  for  a  'msg?'  node  that  is  connected  to  a  node 
containing  'N'  by  a  'status'  link.  If  it  is  found,  the  node 
to  which  the  SELECTed  field  link  (a  subject  link)  points  is 
bound  in  a  partial  match  and  might  be  used  to  formulate  a 
query  response  such  as: 

'  msgl 
subject  :   tuition  fees'. 

Had  no  match  been  found,  the  answer  would  have  been 
'No  message  whose  status  is  'N'  is  found'. 
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3.  3.  4. 2.   Stream  Implementation 

The  testing  of  query  patterns  against  network  fragments 
utilizes  the  notion  of  streams.  A  stream  is  simply  a 
sequence  of  data  objects.  A  straight  forward  implementation 
of  streams  can  be  done  using  lists  in  Lisp.  With  a  single 
fragment  of  a  semantic  net  (a  query  pattern),  the  matching 
process  runs  through  the  copy  of  database  entries  one  by  one. 
This  copy  of  the  database  entry  list  is  given  as  an  input 
stream  (stream  A  in  figure  14)  for  the  matching  process. 
Each  entry  is  a  message  of  the  'MAIL'  database  and  contains 
pointers  to  several  nodes  containing  its  property  values. 
For  each  database  entry,  the  process  which  attempts  the 
matching  generates  either  a  symbol  indicating  that  the  match 
has  failed  or  the  entry  itself.  The  results  for  all  of  the 
given  database  entries  are  collected  into  a  stream,  which  is 
passed  through  a  filter  to  weed  out  the  failures.  The 
result  is  a  stream  of  all  the  database  entries  that  contain 
items  matching  the  query  pattern  (see  stream  B  in  Figure  14). 

In  the  QIRM  system,  a  query  takes  a  copy  of  all  the 
database  entries  as  an  input  stream  for  a  ' where-processor ' 
and  performs  the  network-fragment  matching  operation  for 
every  entry  in  the  stream  as  indicated  in  Figure  15.  That 
is,  for  the  given  input  stream  (stream  I),  the  where- 
processor  generates  a  filtered  new  stream  (stream  II) 
consisting  of  all  entries  which  have  items  satisfying  the 
query  condition.  This  filtered  stream  (II)  is  taken  by  a 
' select -processor '  to  generate  the  final  output  of  the  query. 


stream  A 

<  a  copy  of 
database 
entries ) 


stream  B 
> 

<  a  copy  of 
db  entries, 
filtered) 


Figure  14.     A  query -pattern  processes 
a  stream  of  entries. 


etreaml 
> 

<  a  copy 

of  all 

db 

entries ) 


WHERE- 
PRQCESSQR 


WHERE 
i  (conditions)  I 


stream  II 
> 

<a  filter- 
atlon  of 
stream  I ) I 


SELECT- 
PROCESSOR 


SELECT 
(fields) 


final 
output 

stream 


<  a  stream 
of  field- 
values 
associated 
with  db 
entries  of 
stream  II) 


Figure  15.   A  query  processor  consists  of  ** 

a  where -processor  and  a  select -processor. 
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3. 3. 4. 3.  Simple  Query  and  Stream  Implementation 

To  answer  a  simple  query,  the  system  uses  the  query  with 
an  input  stream  consisting  of  copies  of  all  database  entries. 
The  output  stream  from  the  where-proceseor  contains  the 
filtered  entries.  This  stream  of  filtered  database  entries 
is  then  used  to  generate  a  stream  of  copies  of  the  SELECTed 
field-values,  and  this  is  the  stream  that  is  finally  printed 
at  the  terminal. 

3.3.4.4.  Compound  Queries  and  Stream  Implementation 

The  real  elegance  of  the  stream  implementation  ie  evident 
when  compound  queries  are  considered.  The  processing  of 
compound  queries  makes  use  of  the  ability  of  the  filter  to 
demand  that  a  match  be  consistent  with  a  specified  network 
fragment. 

For  example,  to  handle  the  'and'  of  two  compound-query- 
patterns,  such  as 

<  and   <  =   body   (rules  and  inheritance)   ) 

<  =   to   faculty   )   ) 

<  This  query  can  be  informally  stated  as:  'Find  all  messages 
whose  text  bodies  contain  the  phrase  "rules  and  inheritance" 
and  whose  receivers  are  faculty  members. '. ),  the  query 
processor  first  finds  all  entries  containing  the  fragment 
that  matches  the  following  pattern:  (=  body  (rules  and 
inheritance)  ).  This  produces  a  stream  of  entries,  each  of 
whose   body   contains   the  phrase   'rules   and   inheritance'. 

62 


Having  the  new  filtered  stream,  all  entries  that  contain  the 
fragment  matching  the  following  are  found  among  the  entries 
in  the  new  stream :  ( =  to  faculty ) .  The  '  and '  of  two 
compound-query-patterns  (see  section  3.3.3.2)  can  be  viewed 
as  a  series  combination  of  the  two  component  compound -query - 
patterns,  as  shown  in  Figure  16.  The  entries  that  pass 
through  the  first  compound-query-pattern  are  filtered,  and 
further  filtered  by  the  second  compound-query-pattern. 

Figure  17  shows  the  analogous  method  for  computing  the 
' or '  of  two  compound-query-patterns  as  a  parallel  combination 
of  the  two  component  corapound-query-patterns.  The  input 
stream  of  entries  is  filtered  separately  by  each  compound- 
query  -pattern.  The  two  resulting  streams  are  then  merged 
(for  example,  by  appending  the  streams  and  eliminating  the 
duplicated  entries)  to  produce  the  output  stream  of 
processing  the  'or'  clause. 

From   the  stream -of -entries  viewpoint,   the  ' not '  of  some 
query -pattern   acts   as   a  filter  that   removes   all   entries 
having   items  specified  in  the  query-pattern.     For  instance, 
given  the  clause 
<  not  ( =   from   mary  ) ) 

the  system  attempts,  with  the  given  database  entries,  to 
produce  the  stream  of  entries  consisting  of  network  fragments 
that  satisfy  (=  from  mary).  Then,  the  system  removes  from 
the  input  stream  all  entries  for  which  such  fragments  exist. 
The  result  is  a  stream  consisting  of  only  those  entries  in 
which  the  binding  for  'from*  does  not  satisfy  (=  from   mary). 
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For  example,  in  processing  the  query 
(and   <  =   text   (key  finding)  ) 
( not  <  *   from   mary  )  )    ) 
the   first   clause  will  generate  a  stream  of  entries  each   of 
whose  body  contains  the  phrase  'key  finding ' .      Taking  ' and ' 
with   the   not  clause  will  filter  the  stream  by  removing   all 
entries   in   which   the   bindings   for   'from'   satisfy    the 
restriction  that  the  message -sender ' s  id -name  is  ' mary ' . 

The  queries  containing  '<',  '>',  ' <=  ' ,  or  *>-'  as  the 
predicate  operator  can  be  implemented  with  a  similar  filter 
on  entry  streams.  The  system  uses  each  entry  in  the  stream 
to  instantiate  the  property  variable  ( referred  to  as 
property -name )  in  the  query  pattern  and  then  applies  the  Lisp 
predicate.  Then  the  system  removes  from  the  input  stream  all 
entries  for  which  the  predicate  fails. 


&4 


input 
stream 


of 
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Figure  16.    The  'and'  combination  of 

two  compound -query -patterns 
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Figure  17.   The  'or '  combination  of 

two  compound -query -patterns 
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3. 3. 4. 5.   The  Query  Evaluator  and  the  Query  Driver 

The  function  that  coord ina tee  the  matching  operations  is 
called  ' qeval ' ,  and  it  plays  a  role  analogous  to  that  of  the 
' eval '  function  for  Lisp.  ' qeval '  takes  as  inputs  a  query 
and  a  stream  of  a  copy  of  the  database  entries.  Its  output 
is  a  stream  of  selected  field-values  of  the  database  entries, 
corresponding  to  successful  matches  to  the  query  pattern,  as 
indicated  in  Figure  15.  Like  'eval',  'qeval'  classifies  the 
different  types  of  expressions  < query-patterns )  and 
dispatches  each  to  an  appropriate  function.  There  is  a 
function  for  each  special  form  such  as  and,  or,  not,  <,  >, 
<=,  >  =  ,  and  =. 

The  driver  loop  reads  a  user  request  from  the  terminal 
which  is  specified  following  SELECT-WHERE  prompts.  The 
SELECT  and  the  WHERE  clause  indicate  the  SELECTed  property 
links  and  the  condition  of  the  messages  in  which  a  user  wants 
to  retrieve,  respectively.  For  each  query,  it  calls  qeval 
with  the  WHERE  clause  and  a  stream  that  consists  of  a  copy  of 
all  of  the  database  entries.  This  will  produce  the  stream  of 
entries  which  are  the  result  of  all  possible  matches 
performed  by  a  * where-processor '  (refer  Figure  15  and  section 
3.3.4.2).  For  each  entry  in  this  resulting  stream,  it 
instantiates  the  SELECTed  fields'  values  in  the  entry.  This 
stream  of  instantiated  fields'  values  is  then  printed  with 
the  associated  message  number. 


3.  4.    SUMMARY 

OIRM  is  a  prototype  information  retrieval  system  that  is 
designed  to  provide  not  only  a  functional  enhancement  to  the 
4.3  BSD  Unix  mail  facility,  but  also  some  insight  into  the 
incorporation  of  Artificial  Intelligence  techniques  to  the 
Unix  mail  facility. 

The  database  in  QIRM  is  based  on  the  semantic  network 
structure  and  also  utilizes  an  indexing  scheme.  By 
employing  this  approach,  the  entire  database  can  be  searched 
very  easily.  The  process  for  loading  the  data  into  a 
database  involves  determining  the  data  source,  constructing  a 
database  from  the  data  file,  and  storing  the  database  in  a 
file.  The  decision,  concerning  which  file  is  used  for  the 
source  of  the  data,  is  based  upon  the  existence  of  an  old 
mail  database  file.  This  approach  distinguishes  the  first- 
time  user  from  the  regular  user  of  the  system  and  minimizes 
the  latter's  waiting  time  for  loading  a  database. 

The  QIRM  system  increases  the  selectivity  of  information 
retrieved  by  allowing  the  user  to  specify  a  request  using  the 
mail -query  language.  The  query  language  used  in  this  system 
consists  of  the  simple  query  and  the  compound  query.  This 
language  provides  a  user-friendly  interface  and  a  pattern- 
directed  access  to  the  messages.  The  query  processor  is 
organized  around  the  network-fragment  matching''  and  the 
stream  implementation.  A  query  takes  an  input  stream  of 
database  entries  and  performs  the  matching  operation  for 
every   entry    in   the  stream.      As  its   output,   the   query 
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generates  a  new  stream  consisting  of  all  SELECTed  field 
values  associated  with  the  messages  which  have  structures 
satisfying  the  query's  WHERE  condition. 


CHAPTER  4 
CONCLUSIONS  AND  POSSIBLE  ENHANCEMENTS 

The  characteristics  of  the  approach  which  is  based  on 
structuring  the  mailbox  by  organizing  the  contents  of 
messages,  have  been  illustrated  through  the  description  of 
the  QIRM  prototype  system.  Also,  a  brief  review  of 
electronic  mailbox  systems,  concepts,  and  tools  used  in  the 
prototype  has  been  provided. 

The  QIRM  system  provides  a  user  friendly  interface  using 
a  mail-query  language  based  on  the  tuple  calculus,  and 
employes  techniques  from  artificial  intelligence.  Users  can 
query  messages  by  specifying  the  category  of  the  message  they 
want  to  retreive.  This  approach  provides  an  information 
organization  and  dramatically  increases  the  selectivity  of 
information  retrieved. 

Based  upon  the  concepts  and  features  of  QIRM,  it  has  been 
demonstrated  that  the  work  on  information  retrieval  systems 
and  database  management  systems  is  potentially  relevant  to 
the  design  of  structured  electronic  mail  systems. 

It  is  desirable  that  QIRM  would  be  built  directly  on  the 
top  of  the  existing  Unix  mail  facilities.  In  this  way, 
users  could  continue  to  operate  on  their  mail  as  usual,  and 
also  access  the  messages  in  the  more  flexible  ways  which  01 RM 
provides.  Such  a  system  could  provide  facilities  for 
updating  each  user's  database  file  automatically,   whenever  a 


message  has  arrived  in  or  has  been  deleted  from  the  mailbox. 
With  this  automatic  updating,  the  query  system  could  maximize 
both  efficiency  and  the  user  friendliness  by  eliminating  the 
database-loading  process  which  currently  is  needed  before 
the  retrieval  facilities  of  QIRM  can  be  utilized. 

Many  facilities  could  be  added  to  enhance  the 
intelligence  of  the  system.  For  example,  in  order  to  perform 
more  intelligent  interpretation  of  the  messages,  the  system 
could  also  allow  user -specified  rules  which  automatically 
screen  messages  arriving  in  a  user's  mailbox.  Also,  this 
mechanism  could  be  used  to  sort  messages  into  different 
categories  according  to  the  individual  user's  preference. 
These  sorting  facilities  could  examine  the  fields  and  body 
presented  in  the  mail  and  deduce  the  message  classification 
based  on  its  set  of  rules.  In  addition,  the  system  could 
employ  these  rules  to  present  the  user  with  an  overview  of 
the  messages  currently  available;  thus,  a  user  could  easily 
pick  what  is  of  interest. 

In  the  current  Unix  mail  system,  the  bare  login  name  is 
used  as  a  user-id,  only  when  a  message  is  sent  to  the  person 
on  the  same  machine.  For  example,  if  one  wants  to  send 
messages  to  people  on  the  Arpanet,  a  recipient ' s  id  has  the 
form  " id Ghost * .  'Id*  is  the  login  name  of  the  recipient  and 
'host'  is  the  name  of  the  machine  where  the  recipient  can  be 
found  on  Arpanet.  The  way  of  specifying  the  user 's  id 
varies  in  a  manner  which  depends  on  the  type  of  network 
involved.  This  mechanism  raises  one  drawback  of  the  QIRM 
system :   Since  QIRM  recognizes  a  bare  login  name  and  a   login 
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name  which  ie  a  concatenation  of  the  bare  login  name  and  the 
strings  (e.g.  the  names  of  systems)  ae  different  user-id, 
querying  messages  based  on  the  user— id  sometimes  produces 
unexpected  output.  Another  drawback  is  that  QIRM  does  not 
provide  an  intelligent  function  of  determining  the  identity 
of  an  object  (the  partial  property  value).  For  example 
' farmer  * ,     *  farmer  1  ' ,  and    *  farmers '   are   recognized   as 

different  objects  whereas  all  three  should  be  interpreted  as 
identical.  Such  a  problem  is  caused  by  this  system's  search 
procedure.  In  order  to  perform  fast  searching,  this 
procedure  scans  and  recognizes  an  object  not  character  by 
character  but  word  by  word.  (A  word  indicates  a  sequence  of 
any  characters  except  blank. >  Since  it  is  more  desirable  to 
develop  a  system  which  is  both  efficient  and  intelligent, 
the  way  for  accomplishing  such  a  result  should  be  sought. 


71 


BIBLIOGRAPHIES 

til.  P.  A.  Wilson,  'Structures  for  Mailbox  System 
Applications',  Proceedings  of  the  IFIP  6.5  Working  Conference 
on  Computer-Based  Message  Services,  May  1984,  North  Holland 
Publications,  1984,  pp  149  -  165. 

[23.  Jacob  Palme,  'You  Have  134  Unread  Mail!  Do  You  Want  To 
Read  Them  Now?',  Proceedings  of  the  IFIP  6.5  Working 
Conference  on  Computer-Based  Message  Services,  May  1984, 
North  Holland  Publications,  1984,  pp  175  -  184. 

[33.  Robert  Wilenskey,  'Lisp  Craft',  W.W.  Norton  &  Company, 
1984. 

[43.  Thomas  W.  Malone,  Kenneth  R.  Grant,  Franklyn  A.  Turbak, 
Stephen  A.  Brobst,  and  Michael  D.  Cohen,  'Intelligent 
Information-Sharing  Systems',  Communications  of  ACM,  Vol.  30, 
No.   5,  May  1987,  pp  390  -  402. 

[53.  Dennis  Tsichritzis,  Fausto  A.  Rabitti,  Simon  Glbbs, 
Oscar  Nierstrasz,  and  John  Hogg,  'A  System  for  Managing 
Structured  Messages',  IEEE  Transactions  on  Communications, 
Vol.  com-30.  No.  1,  Jan.  1982,  pp  66  -  73. 

[63.  P.  A.  Wilson,  'Applications  and  Structures  for  Mailbox 
System',  Data  Communications:  The  Wired  Society,  P.  D. 
English,  Maidenhead  Berks.,  England:  Pergamon  Infotech., 
1983,  pp  73  -  94. 


72 


[73.  Matthias  Jarke,  Jurgen  Koch,  and  Joachim  W.  Schmidt, 
'Introduction  to  Query  Processing',  Query  Processing  in 
Database  Systems,  Springer- Verlag  Berlin  Heidelberg,  1985,  pp 
3  -  28. 

[83.  Gordon  McCalla  and  Nick  Cercone,  'Approaches  to 
Knowledge  Representation',  Computer,  IEEE  Computer  Society, 
Vol  16,  No.   10,  Oct  1983,   pp  12  -  18. 

C93.  Ronald  J.  Brachman,  'What  IS-A  Is  and  Isn't:  An 
Analysis  of  Taxonomic  Links  in  Semantic  Networks',  Computer, 
IEEE  Computer  Society,  Vol.  16,  No.  10,  Oct  1983,  pp  30  -  36. 

[10).   Kurt  Shoens  and  Craig  Leres,   'Mail  Reference  Manual 
Version  5.2',  Apr.  1986. 

[11].  Julian  Newman,  'Contracts  Made  by  Electronic  Mail: 
Legal  Issues,  Technology,  and  Services',  Proceedings  of  the 
IFIP  6. 5  Working  Conference  on  Computer-Based  Message 
Services,  May  1984,  North  Holland  Publications,  1984,  pp  237 
-  246. 

[12].  Chamberlin,  D.  D.  et.  al.  ,  'SEQUEL2:  a  unified 
approach  to  data  definition,  manipulation  and  control',  IBM 
Journal  Research  and  Development,  Vol.  20,  No.  6,  Nov.  1976, 
pp  560  -  575. 

[13].  S.  M.  Deen,  'Principles  and  Practice  of  Database 
Systems',  MacMillan  Publishers  Ltd,  1985. 


[14].   Alan  Mayne  and  Michael  B  Wood,   'Introducing  Relational 
Database',  NCC  Publications,  1983. 

[153.     William  D.   Haseman  and  A.  B.  Whinston,   'Introduction 
to  Data  Management',  Richard  D.  Irwin  Inc.,  1977. 

[163.   John  K.  Foderaro,  K.  L.  Sklover,  and  Kevin  Layer,   'The 
FRANZ  LISP  Manual',   1983. 

[173.  Elaine  Rich,  'Artificial  Intelligence',  McGrow-Hill 
Book  Company,  1983. 

[18].  J.  A.  Welch  and  P.  A.  Wilson,  'Electronic  Mail  Systems 
-  A  Practical  Evaluation  Guide',  NCC  Publications,  1981. 

[193.  Jeffrey  D.  Ullman,  'Principles  of  Database  systems'. 
Computer  Science  Press,  1982. 

[203.  Peter  Vervest,  'Electronic  Mail  and  Message  Handling*, 
Quorum  Books,  1985. 

[213.  Avron  Barr  and  E.  A.  Feigenbaum,  'The  Handbook  of 
Artificial  Intelligence  Volume  I  8.  II',  HeurisTech  Press, 
1981. 

(223.     A.  V.  Aho,  J.  E.  Hopcroft,  and  J.  D.  Ullman,    'The 
Design  and  Analysis  of  Computer  Algorithms',  Addison-Wesley 
Publishing  Company,  1974. 


APPENDIX  A 


ON-LINE  MANUAL 


This  appendix  contains  an  on-line  manual  for  the  users. 
It  describee  the  user-interface  of  QIRM,  and  the  syntax  of 
the  mail -query  language  used  by  users  for  specifying 
their  request  on  retrieving   messages. 
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NAME 

QIRM  -  retrieve  mail 


SYNOPSIS 
QIRM 


DESCRIPTION 

QIRM  is  a  Query  system  for  Information  Retrieval  in  a 
Mailbox  (QIRM). 

QIRM  reads  messages  in  a  user  *  s  mailbox  <  or  messages  in 
both  a  mailbox  and  an  old  mail  database  file)  and 
constructs  a  database.  An  old  mail  database  file 
arrives  at  by  concatenating  a  user's  login-id  and  a 
string  'data*.  For  example,  if  *  mary '  is  the  user's 
login -id,  then  old  mall  database  file  is  named 
'marydata'.  The  database  in  'marydata'  is  the  one 
that  had  been  constructed  when  QIRM  was  called  the  last 
time  by  a  user,   ' mary ' . 

After  the  database  is  constructed,  QIRM  asks  the  user 
to  categorize  the  messages  which  he/she  wants  to 
retrieve.  The  user  can  sptecif  y  the  fields  and 
conditions  of  such  messages  by  posing  queries  in 
response  to  the  system's  SELECT-WHERE  prompts.  The 
system's  response  to  the  user's  query  involves  the 
numher  of  messages  retrieved  and  the  first  item  of  the 
messages,  if  there  are  more  than  one  message  recalled. 
In  such  a  case,  QIRM  gives  the  number  of  additional 
response  records  and  interrogates  the  user  about  their 
disposition.  That  is,  the  user  can  either  continue 
accessing  the  remaining  messages  by  choosing  ' y  *  as  the 
answer  to  the  system's  ' More? < y/n > '  question  or  start  a 
new  dialogue  with  the  new  SELECT-WHERE  prompts.  The 
user  can  exit  the  system  by  SELECTing  *  stop '  command. 


Simple  Queries 


The  syntax  of  simple  query  is 

SELECT   (  <field>+   I   *  ) 
WHERE    (  <query-pattern>  )    1 
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Some  points  concerning  this  query  are: 

i-     a.    <field>+   indicates  the  set  of  length  one   or 
more   which  consists  of  elements  of   the   form 
<field> 

where,  in  turn,  is  made  up  of  sdate,  rdate, 
form,  to,  subject,  statue,  and  body  indicating 
the  date  of  sending  a  message,  the  date  of 
receiving  a  message,  the  message -sender  *  s 
login -id,  the  message- recipient  *  e  log in -id, 
the  subject  of  a  message,  the  status  of  a 
message,  and  the  body  of  a  message, 
respectively. 

b.  *  I  *  indicates  alteration. 

c.  '  *  '  is  the  shorthand  of  all  fields. 


2. 

2.1.   <query-pattern>  has  the  following  structure. 

<predl cate- opera tor >  <property>  <property value> 

where 

a.  <predicate- opera tor >  can  be  <•  >,  <=,  >=,  =. 

b.  <property>  can  consist  of  sw,  sd,  sm,  sy,  rw, 
rd,  rm#  ry,  from,  to,  euhject,  status,  and 
text. 

'sw,   sd,   sm,   sy*  indicate  the  week  day,  the 
day  number,   the  month,   the  year  of  sending  a 

message,   respectively.       ' rw,  rd,   rm,  ry ' 

indicate   the  week  day,   the  day  number,   the 

month,    the    year   of   sending  a    message, 

respectively.    'text'   indicates  a   phrase   or 

words  in  a  message  body.   ( A  word  is  a  sequence 
of  any  characters  except  blank. > 

c.  1 .   <property value>   can  be  the  value  to   be 

searhed  for  in  the  named  property  of 
messages.  In  the  case  of  the  value  for 
' text '  property,  either  the  exact 
subject  of  a  message  or  a  word  out  of 
the  message -subject  can  be  specified. 

2.  If  the  property  value  consists  of  more 
than  one  word,  it  should  be 
parenthesized,  otherwise,  it  should  not. 

2.  2.     (  )    indicates  that  ' no  condition '  is   specified 
for  retrieving  messages. 
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3.   Ij  J  RM  does  not  distinguish  between  upper  and  lower 
case  letters. 


Compound  Cluer  lee   »  « 


The  syntax  of  a  compound  query  ie 

SELECT   <  <field>«-   >    I     (*) 
WHERE    < ANDGR- query >  I  < NOT- query > 


Some  points  concerning  this  query  are : 

1 .  <ANDQR-query>  has  the  following  form : 

(  and  I  or   < compound -query- pa ttern> *  *  ) 

where 

a.  <compound-quer y- pattern > +  +  indicates  the  set 
consisting  of  two  or  more  <compound -query  - 
pattern>s. 

h.  <compound-query-pattern>  can  be  any  of  the 
following : 

(and   <compoun d - query- pa ttern>*  *   ) 
(or   < compound -query -pattern >  + +   ) 
(not  <  <quer y-pattern>  )  > 
(   < query - pa ttern>   ) 

2.  <Not-query>  has  the  following  form : 
<  not  (  <query-pattern>  )  ) 


Dec.  &,     1987 


APPENDIX  B 

SOURCE  CODE  LISTING 

This   apipendix  consists  of  two  listings  of  the  source   code 
for   implementing   QIRM.       OIRM. 1  contains  a   main   source 
listing   in  Lisp,     and  ml.c  comprises   a  source  listing   of 
*C*   function  which  is  used  to  ohtain  the  name  of   a   user's 
mallhox . 
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This    system    function    causes    the    Lisp    system    to 
go    through    the    transfer    tables    and    reset    al  I 
the    appropriate    links. 


(sstatus    translinK    on) 


Declare    special    variaoles    for    the    compiler. 


(declare     (special     inport    oataport    outport    st    number    MAIL    )) 
(declare    (special     dummyport     idname    chn     change    done)) 


In    order     to   make    'reader'    case-insens i t i ve i     the 
reader     is    modified    to    conform    with    UCI-Lisp    syntax, 


(c  vttou  c  i  lisp) 


Set  the  syntax  class  of  these  characters  to 
syntaxclass  in  the  current  readtable. 


( setsy nt ax 
( set  syntax 
(set  syntax 
( set syn  tax 
( setsyntax 
( setsyntax 
(setsyntax 


'/. 
'/' 
1  /" 
'/[ 
'/] 
'// 


' vena  rac  ter  I 
' vcharacter  ) 
' vchar  acter  ) 
' vcharacter  ) 
' vcharacter  ) 
' vcharacter  I 
'vcharacter  ) 


MAIN  FUNCTION  OF  the  UIRM  SYSTEM 


This  function  begins  witn  calling  a  function 

1  get-ma  I  I  box  f i  I e1  to  access  a  user's  mailbox  and  check 


If  it's  empty*  a  proper 
terminal  and  a  system  i  s  •• 
this  function  calls  'setup 
source  of  input  data  and 
a  databa  se . 

system  finds  any  difference  between  the  mail 
current  mailbox  and  the  mail   in  'old-db'j  it 


if  the  mailbox  is  empty. 

message  is  printed  at  the 

terminated.     Otherwise» 

-db'  which  determines  the 

loads  the  input  data  into 

If  the 

in  the 

calls  '  sav e-db- i n-f i  I e '  to  update  'old-db'.    'Old-Ob' 

is  an  old  database  file  containing  the  database  which 

was  constructed  form  the  most  previous  mailbox. 

Finally)  'query-driver'  is  called  in  order  for 
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( t  (go  mor  e  )  )  )  ) 
I  ( =  en  -1)  (return  t )  ) 
(t    (readc     inport)     (go    more)))))     ) 


Tnis    function    saves    the    database     in    a    file     'old-do' 
in    the     list    format    of    Lisp.         The    file    is    named 
by    concatenating    the    user's     login-id    and    the    string 
' da  ta  '  . 


(defun  sa v e— do-i n-f i I e( ) 

(declare  (special  outport)) 
( p  rog ( n  I  mname ) 

(setq  outport  ( out  file  (concat  idname  'data))) 

( setq  n  1 ) 

(setq  I  ( length  MAIL) ) 
I  oop 

(setq  mname  (concat  'msg  n)) 

(pr  i  nc  ' / (  ou tpor  t ) 

(wrt-db  (get  mname  'rdatell 

(wrt-db  (get  mname  'sdate)) 

(wrt-db  (get  mname  'to)) 

(wrt-db  (get  mname  'from)) 

(wrt-db  (get  mname  'pos- I  i nenum  )  ) 

(wrt-db  (get  mname  'text)) 

(wrt-db  (get  mname  'status)) 

(wrt-db  (get  mname  'subject)) 

( or  i  nc  '  /  )  outpor  t ) 

(setq  n  (1+  n ) ) 

(cond  ((>  n  I)   (close  outport)  ) 
( t  (go  I  oop ) ) 1 ) ) 


This  function  uses  'cfasl'   to  load  a  foreign 
function  'ml.c'  (written  in  'CM   into  the  lisp 
system.     Using  the  user-id  returned  from  'ml.c'» 
this  function  obtains  the  user's  mailbox-id  in  the 
proper  form  through  a  subfunction  'id-string'. 


(defun  get-ma i I boxf i I e ( ) 
(declare  (special  ifll) 

(  cfasl  'ml.o  '_ml  'ml  "integer-function"! 
(seta   id  (  new-vector  i -byte  20)1 
(setq  idname  (id-string  (ml  id)  ""  id  0)  ) 
(seta  inport  (infile 

(concat  ' // usr / /spoo I //ma i I //   idname)))) 
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A  function  to  transform  the  value  returned  from 
'ml.c'  to  a  desired  format  (a  string). 


(defun  id-string(  num   name  id  i) 
(  cond 

{ ( =  0   num  )  name ) 

(t  (id-string  (1-  num) 

(concat   name  (ascii  (vrefi-byte  id  i))) 
io   (1+  i  )  )  )  )  ) 


This  functioni  firsti  checks  the  existance  of 
'old-dD'.   If  'old-do'  found»  it  reads  a  msg 
in  'old— db'  comparing  the  key  portion  of  the  same 
msg  of  mailbox  ('line')  with  that  of  'old-db'. 
If  two  messages  compared  are  identical)  contents 
of  the  message  are  added  to  the  database  'MAIL' 
and  the  result  of  the  comparison  is  returned. 


(defun  co  mp  ar  e  (  I  i  ne  ) 
(progtdata  p  msgname) 
(setq  msgname  (concat  'msg  number)) 

loop 
(setq  data  (read  dataport)) 
(cond  (  (null  data)  (return  'end)) 
(  (equal  line  (car  data)) 
(setq  p  ( r ead-one-m sg ) ) 
(putprop  msgname  I  i ne  'roate) 
(putprop  msgname  (nth  1  data)  'sdatel 
(putprop  msgname  (nth  2  data)  'to) 
(putprop  msgname  (nth  3  data)  'from) 
(putprop  msgname  (list  p   (cadr  (nth  <i 
'  pos-l  i  nenum ) 


(  t 


(putprop  msgname 
(putprop  msgname 
(putprop  ms  gname 
I setq  number  (  1  + 
( setq  MAIL  ( cons 
(return  *ok)) 
(setq  change  1 ) 


(nth  b    data) 
s t  'status) 
(nth  7  data) 
numb  er ) ) 
msgname  MAIL)) 

( go     I oop  )  )  ) )  ) 


data)  )  ) 


•text) 
'subject ) 


A     function     to    read    single    msg     (neader     +     body)     in 
•ol d-ob' . 


(defun    r ead— one-ms g I ) 
( p  r og (  I  i  ne  ) 

again 
(setq     line    (lineread     inport     t)) 
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(  cond  (  (null  I  i  ne  ) 

(return  (filepos  inport))  ) 
(t  (it  (eq  (car  line)  'status:) 

(setq  st  line))  (go  again))))) 


A  sud  function  of 
the  database  in 


' sa  v  e-db 
old-db'  • 


-  i  n-f  i  I e  *  to  write 


(defun  wrt-db(data) 
( proq I  I  i stdata  ) 

(if  I  I i  stp  I  car  da ta)  ) 
then  (patom  "("  outport)  (setq  I  istdata  1)  ) 
keep-wr i  t  i  ng 

(cond  (  (null  data) 

(if  (=  listdata  1)  (patom  ")"  outport)  )    ) 
(  (atom  (car  data ) ) 

(patom  data  outport)  (terpri  outport)  ) 
(  t   (patom  (car  data)  outport)  (terpri  outport) 
(setq  data  (cdr  data))  (go  keep-writing))))) 


The  function  builds  a  database  by  reading  mailbox  as 

an  input  file.    After  the  contents  of  each  message 

are  constructed  into  the  database»  the  message  name 

is  added  to  a  global  variable  'MAIL'. 

'Line'  is  the  first  line  of  a  ms g  which  was  read 

from  'mailbox' i   'num-line'  is  the  number  of  lines 

of  the  current  header  field* 

'headerp'  is  a  boolean  variable  to  show  the  current 

line  is  in  the  header  part  or  in  the  Dody  part 

( 1 :  header   0:  booylj   'mailnum'  is  a  integer 

indicating  the  msg  number. 


(defun  build-ab(  line  num-line  headerp  mailnum) 
( setq  change  1 ) 
(progfa   name  position  ) 
(setq  name  (concat  'msg  mailnum)) 


I  oop 
(cond 


( ■  header  p 
(  setq 
( c  ond 


1) 


( 


(  I  i  ner ead  i  npor  t 
( not (nu II  a ) ) 
(cond  (  ( eq  ' / : 


t  )  ) 


(car     (last     (explode     (car    a))))) 
(cond    ((=    num-line    1) 

(add-db     (car     line)    name 
line)) 
(t     (ado-db     (caar     line) 
name     line) 
(setq    num-line    1)     I     ) 
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(  se  tq     line    a  ) 
(go     loop)) 
(  t 
(cond     ((=    num-l  ine     1) 
I se  t q     line 
(appendl    (list    line) 
( t    ( setq     line 

( appenal     line       a  )  )  ) 
(setq    num-line     <!■»       num-l  ine)) 


a)  )  ) 

) 


( go     I  oop  )     )     )  ) 
( t    (setq    header  p    0 ) 

(setq    position    (Mlepos     inport)) 
( cond    ( ( •    nun- line    1  ) 

(add-db    (car     line)     name     line)) 
(t    (add-dD     Icaar     line)     name     line) 
(setq    num-line     1))) 
(go    loop)     )  )     ) 
(t    (add-db    *Pos-l  inenum       name 

(list    position    (msg-body    mailnum))) 
(setq    MAIL       (cons    name    MAID) 
( cond    (  I  =    done    0 ) 

(setq    line    (lineread    inport)) 

(  setq    heade  rp    1 ) 

(setq    mailnum     (1+       mailnum)) 

(setq    name    (concat    'msg    mailnum)) 

(  go    I  oop ) ) 

)  )     )     )  ) 


A  subfunction  called 
data  to  tne  database, 


by  'Duild-db'  to  add  a  line  of 


(defun  add-db  (field  ma i I  name  1st) 
( ca  seq  field 

(From   (putprop  mailname 

(Date:  (putprop  mailname 

(From:  (putprop  mailname 

(To:     (putprop  mailname 

(Status:  (putprop  mailname  1st  'status)) 

(Subject:  (putprop  mailname  1st  'subject)) 

(Pos-linenum  (putprop  mailname  1st  ' pos- I i nenum ) ) 

(Text  (putprop  mailname  (cdr  1st)  'text)) 

( Apparent  I y-To :  (putprop  mailname  1st  'to))  )) 


(cdr  1st)  'rdatel) 
1st  'sdate)) 
1st  'f  rom) ) 
1st  ' to) ) 


'Msg-body'  is  a  function  to  read  the  body  of  a  msg 
and  add  this  oata  to  the  database. 

'k-one'  is  a  sub-function  to  read  one  line  of  a  msg 
body  and  return  it. 
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(defun    msg-body    (mailnun    ) 

( pr og ( one  I  i ne    text    char    num-tex 1 1  i ne  ) 
(setq    num-textl  ine    0) 
text    ' ( ' ("")  )  ) 


(setq 

I  oop 

(  setq 

Icond 


char 
( 


(  t 


( ty  i  p  eek 
( =  done  1 ) 
(aod-ab 
(return 
(and  (■ 
(add-db 
( a  d  d  -d  b 
(  r  etur  n 
(  if 


inport))     (setq    onel  ine     (r-one)) 
mailnum)     text) 


•Text    (concat    'msg 
num-tex  1 1  i  ne )       ) 
char     70)     leq     (car    onel  ine)     'From 
'From     (concat     'msg     (1+    mailnum)) 
'Text    (concat    'msg    mailnum)    text) 
num- 1 ex t I  i  n  e    ) ) 
( not (null    one  I  ine  )  ) 
(setq    text    (appendl    text    oneline     ))) 
(setq    num-textl  ine    (1+    num- 1 ex 1 1  i ne  )  ) 
( go     I oop  )     )     )  )  ) 


)  ) 

onel  ine) 


(defun  r-one (  ) 
( prog ( c  ch) 
I  op 


(setq 
(  cond 


ch 
(  (  = 
(  (> 


(  (  = 

(  (  = 
( t 


( ty  i  pee  K  i  npor  t )  ) 

en  32)  (readc  inport)  (go  lop)  ) 

ch  32)  (if  (memq  ch  '  (40  41  )  ) 

then    (  readc     inpo  r  t ) 

e  I  se 

(setq  c  (appendl  c  (ratom 

(go  lop)  ) 
ch  10)  (readc  inport)  (return  c)) 
ch  -1)  (setq  done  1)  (return  c)) 
(readc  inport)  (go  lop))  ))) 


inpo  r  t ) )  )  ) 


tion    of    query    system    scans    a    user- 
it    to    the    query    evaluator     ' qeva  I  ' 
a    copy    of    a    stream    of     entries    of     the 

the    evaluation    is    a    stream    of     selected" 

f    database    entries    which    contain    items" 

query.  The     final     stream    is 

terminal    by    '  pr t-ou t pu t ' »     'prt-mail' 

'printtext2't     'prtlist'. 

nd    'prtmail'     is    the    functions    to    print" 
of    user-selected    fields    at    the 


This  ma  in  f unc 

que  ry »  passes 

toge th  er  with  ; 

da  tabase  • 

Tne  result  of 

field-values  o 

sat  i  sf  y i  ng  the 

pr  i  nted  at  the 

' pr  i  nt  text  1 '  i 

'Prt-output'  a 

out pu  t  values 

terminal* 

*  Pr  i  nt  text  1'  a 

printing  a  mes 

body  with  more 

the  body  with 

nd     'print  tex t2 

sage    body:    the 

than    22     lines 


are  the  functions  for 
former  is  for  the  text 
and    the     latter     is    for 


equal     or     less    than    22     lines    of     text. 
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(defun  query-ar i ver ( ) 
( pr og (f  d  field  q ) 
mor  e-qu  ery 
( t  er pr  j  ) 

(princ  "SELECT   " ) 
( setq  field  ( read )  ) 

(if  (and  (listp  field)  (eq  '»  (car  field)) 
( nul I  (cdr  field))) 
tnen 

(setq  fd  Mrdate  sdate  to  from  suoject  status  body)) 
else  (setq  fd  (contents  field  (type-check   field)))) 
(cond   (  (eq  fd  'stop)  (princ   "***   oye  bye   **") 

(return  ' *)  ) 
(  (eq  fd  'error) 

(go    mor  e-que  r  y )     ) 
( t    (princ    "WHERE       " ) 
( setq    q    ( r  ead )  ) 
(if     (null     q)     then     (prt-output    fd    MAIL) 

else    (prt-output    fd     (qeval     q    MAIL     )     )     ) 
(     no    mor  e-que  r  y  )     )  )  )  ) 


(defun    prt-output(     field     1st     ) 
(cond    (     (eq    'err  or     1st) 

(princ    "**    ERROk    -    Unknown     expression    **")) 
( t    I i  f     ( -    0    (  length     I  st)  ) 

then    (msg    "****      No    message    Recalled!       ***«") 

( ter pr  i  )    ( ter  pr  i  ) 
e  I  se 
(■nsg    "****      "    (  length     1st) 

"    message    Recalled!    ****"     ) 
(terpri)     (terpri) 
(msg      "£.    "      (car     1st))     (terpri) 
(pr  tmai I       field    1st)     )))     ) 


(defun  pr tma i I ( f ie I d  1st) 

(proglfld  m-num  output  field2  linenum) 
(setq  f  ie I  02  field) 

keep-pr  i  nt  ing 
( setq  f  Id  (car  f  i  eld2  )  ) 
(setq  m-num  (car  1st)  ) 
(  c  ono   (  ( eq  fid  '  body  ) 

(if  (null  (get  m-num  'text)) 
then  (princ   "**  no  message  body  found   s^'*  ) 

( ter  pr  i  ) 
e  I  se 

(setq     linenum    Inth     1    (get    m-num    ' p os- I i ne num ) ) ) 
(if     (>     linenum    22) 
then 

(printtextl     (nth    0     (get    m-num     'pos-linenum)) 
I  i  n  e  num  J 
e  I  se 
(printtext2    (nth    0    (get    m-num    'pos-linenum)) 
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I  i  nenum  )  )  )  ) 
(eq    fid    'sdate)     Iprinc    "sdate:    "J 
(princ    (cdr     (get    m-num    fid)))    (terpri)     ) 
( eq    fla    'rdate)     (princ    "rdate:    ") 
(princ     (cdr     (get    m-num    fid)))     (terpri)     ) 


(setq    output    (get    m-num 
(cond    ((null     output) 

( msg  "**   no  "  f  I  d 

( t 


fid)) 

found   »< 


')  ) 


(if  (atom  (car  output)) 
( pr  i  nc  output ) 
(  pr  t  I  i  st  output )  )  )  ) 
( terpr  i )   )  ) 
( setq  f  i  e I  02  (cdr  f  i  e  Id2 )  ) 
(if   (nul  I  f  i  eld2 ) 
then  (terpri) 

(if   (neq  1  (  lengtn  1st)) 
then  (msg  (1-  (length  1st)) 

"  message  left!   More?(y//n) 
(if  ( yes )  then 
(terpri  ) 

(msg  "£.  "   (cadr  I  st)  ) 
el  se   (setq  I st  • ( )  )    ) ) 
(terpri) 

( setq  I st  (cdr  1st)) 
(setq  f  i  e Id2  field)  ) 
(if  (null  1st)  (terpri)  (go  keep-printing)) 


)  ) 


(defun  pr i nt te xt 2 (no s  linenum) 
( f  seek  i  npor  t  pos  0 ) 
(terpri) 
( prog ( a  ) 

I  OOP 

(setq  a  (readc  inport)  ) 

(princ  a ) 

(cond  [  (eq   a  ( asc I i  10 )  ) 

(setq  linenum  (subl  I  inenum)) 
(if  (=  0  linenum)  (return  t)  (go 
t t  ( go  loop  )  )  ) )) 


oop  )  )  J 


(defun  printtextltpos  linenum) 
( f  seek  i  npor  t  pos  0 ) 
(terpri  ) 

n) 

0) 


( pr  oq ( a 
(setq 
I  oop 
(setq  a  ( r  eadc 
(princ  a ) 
( cond  (  ( eq   a 


i  nport  )  ) 


(a  sc  i  i  10 )  )  (setq  n 
(setq  I i nenum 
(if  (  >  I  i  nenum  0  ) 
then  (if  (>=  n  20) 

then  ( pr  i  nc  "  MORE" ) 

(if  ( eq  I ty i )  10) 


(addl  n  )  ) 
(subl  I i nenum ) ) 


(go  loop) 
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[  t 


(return  t ) ) 
else  (go  I  oop  )  ) 
else   («  0  linenum)  (return 
( go  I oop)  J  )  )  ) 


t )     I  go    I oop  )  )     ] 


Idefun    prtlist(lst) 

(cond       (     (not     (null     1st))     (princ     (car 
(pr tl  ist    (cdr     1st))     )  )     ) 


1st))  ( ter  pr  i  ) 


'Ves'i  'type-check',  and  'contents'  are  utility 
functions.   'Yes'  returns  'true'  if  a  user  types 
'y'.   Otherwise,  this  function  returns  'nil'. 
'Contents'  and  'type-check'  are  functions  called  by 
'query-driver'  to  check  the  syntax  of  a  user  query. 


( de  f  un  y es  (  ) 

(if  ( equa I  ( r  ead  ) 


'y)  t   nil)  ) 


(defun  t ype-c neck  ( exp  ) 

(cond  (  (eq  exp  'stop)  'stop) 

(  (or  (atom  exp)  (null  exp)) 
(msg   "**  Unknown  expression  type  — 
'error  ) 
( t  exp  )  ) ) 


**") 


(oefun  conten t s ( expl  exp2) 

(cond  (  (eq   'error  exp2)  'error) 
(  ( eq  'stop  exp2)  'stop) 
(  (null  exp2 )  expl  ) 
(  (memq  (car  exp2)  ( q-f i e I d- I i st  )  ) 

(contents  expl  (cor  exp2))  ) 
(t  (msg   "**  Unknown  field  —  "  exp2 
'error)  ) ) 


**") 


•Qeval '  takes  a  query  and  a  stream  of  a  copy  of 
database  entries  as  inputs.   It  classifies  the 
different  types  of  expressions  ana  dispatches  to 
an  appropriate  function  for  each. 
'Check-domain'  is  called  for  integrity-checking 
of  property-values  of  user-queries. 


(defun  qevallq  frame) 
(if  (atom  q )  'error 
(caseq  (car  q) 
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(and 

(  or 

(not 
(  = 
(  <« 

(>  = 

(  < 

(> 

(  t 


(if  (null  (cddr 
else    ( con  jo  i  n 

(if  ( nul I  Icddr 
else    (disjoin 


q ) )    then    'error 
( cdr    q I       f  r ame  )  )  ) 
q ) )       then    'error 
(cdr     q )     f  rame  )  )  ) 


(negate       (cdr     q)     frame)) 
(equate     (nth    u     (cdr    q))     (nth     1    (cdr    ql)     frame)) 
(less-eq     (check-domain    (cor     q) 

(cadr    q)     (caddr    q))     frame)) 
(greater-eq    (check-domain    (cdr    q) 

(cadr    q)     (caddr    q))     frame)) 
(less    (check-domain    (cdr    q)     (cadr    qMcaddr    q)) 

(cadr     q)     (caddr     q)     frame)) 
(greater     (check-domain    (cdr    q) 

(cadr    q ) ( caddr     q  )  ) 

(cadr  q)  (caddr  q)  frame)) 
'error  )  ))  ) 


(defun  ch eck-d oma i n ( q  property  value) 
(  cond  (  (memq  property  ' ( rm  sm)) 

(if  (memq  value  ( month  I  i st ) )  q  'error)) 
(  (memq  property  '(rd  sd  ry  sy)J 

(if  (numDerp  value)  q  'error)  ) 
(  (memq  property  '  (  rw  sw)) 

(if  (memq  value  (weeklist))  q  'error))   )) 


This    function    takes    a    property-name    and    an    entry 
(a    message))    and    returns    the    value    of    given 
property    of    the    message* 


(defun    get-property-value(property 
( caseo       property 

(rw  (nth    1    (get     I     'rdatel)       ) 

( rm  ( nth    2    ( get     I     '  r  date) )     ) 

( r d  ( nth    3     ( get     I      ' r da te) )     ) 

(ry  (get_pname    (implode     (cddr 


I  ) 


(  sw 

(  sm 
(  sd 
(sy 

(  t 


( exp I  ode (n  th 
(nth  1  (get  I 
(nth  3  (get  I 
( nth  2  (get  I 
( nth  4  (get  I 


5  (get  I 
■ sdate ) )  1 
' sdate  )  )  ) 
'sdate  )  )  ) 
'  sdate ) )  ) 


(cdr  (get  I  property)  )) 


'rdate  ))))))  ) 


)  ) 


'Equate'  handles  the  query  with  '='  as  its  predicate 
operator . 
If  the  g  i \ 
calls  'atom- 


given  property 


'  text 


property  is  'text'-   then  it 
text-eauate'  or  'list-text-equate' 
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according    to    the    type    of    property-value    which    a    user 
specified    in    his    query.       That    is>     if     the    property- 
value     consists    of    more    than    one    word*     'list— text- 
equate'     is    called.       Otherwise)     'atom-text-equate'     is 
called.       A    word     indicates    a    sequence    of     any 
characters    except    blanK. 

If    the    given    property     is     'subjects     '  sup  jec  t-equa  te' 
is    called. 

For     the    queries    excluding    those    with     'text'     or 
'suoject'    as    their    properties)     'o  ther -equate '     is 
called     to    handle    a    query    with     '='. 


(defun    equa te ( prop er t y    qvalue     I) 
(cond     (     (eq    property     'text) 
(if     ( at  om    qva I ue ) 

(atom-text-equate    qvalue    I) 
(list-text-equate    qvalue    1)1     ) 
(     (eq    property     'subject) 

(subject-equate    property     qvalue     I) 
(     (memq    property 

'(to    from    subject    status    rw    rd    sw 
(other-equate    property    qvalue     I)) 
( t    'er  ror )  )  ) 


r m    sm    r y    sy  )  ) 


(defun    o  th  er -equate  (p  ty    qvalue     I) 
(prog    (frame    prop-value) 

(cond     (     1 eq    pty     ' sw )     (setq    qvalue     (concat    qvalue    ")"))) 
<     ( eq    p ty     '  ry ) 

(setq    qvalue     (get_pname     (concat    qvalue    ""     ))))) 
I  oop 
(setq    prop-value    (get-property-value    pty    (car     I))) 
(cond     (     land    (atom    qvalue)     (listp    prop-value) 
(or     (memq    qvalue    prop-value) 
(memq    (concat     qvalue    ")") 
p  rop-val ue  )  )     ) 
(setq    frame    (cons     (car     I)    frame))) 
(     (and    (atom    prop-value) 

(equal     qvalue    prop— value)) 
(setq    frame     (cons     (car     I)     frame))     )) 
(setq     I     (cdr     I)) 
(if     (null     I)     (return    frame)     (go     loop)     )     )) 


(defun    sub jec t-equa te( pt y    qvalue     I) 
(prog     (frame    prop-value     ) 
I  oop 
(setq    prop-value     (get-property-value    pty     (car 
(cond     (     (and    (atom     qvalue) 

(or     (memq    ava  I  ue     prop-value) 
(memq     (concat     qvalue    "»") 
(memq     (concat    qvalue    ".") 
(setq     frame    (cons     (car     I)     frame))) 
(     (and    (listp    qvalue)     (equal     qvalue    prop-value)) 


I  )  )  ) 


p  rop-va I ue ) 
prop-va I ue ) ) 
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(setq  frame  (cons  (car  I)  frame))  )  ) 
( setq  I  (cdr  I ) ) 
(it  (null  I)  (return  frame)  (go  loop)  )  )) 


(defun  a tom-te xt-equa te ( ova  I ue  1st) 
(progtframe  prop-value  found) 
I  oopl 

(setq  prop-value  (get  (car  1st)  'text)  ) 
Iprogl one  list) 
I  oop2 

(cono  (  (null  prop-value)  (setq  founa  0)  ) 
(  (listp  (car  prop-value)) 

Isetq  onelist  (car  prop-value)) 
(if  (or  (memq  qvalue  onelist) 

(memq  (concat  qvalue  '/>)  onelist) 
(memq  (concat  qvalue  '/.)  onelist) 
t  nen  (setq  f  ound  1 ) 

else  (setq  prop-value  (cdr  prop-value)) 
( go  I  oop2  )  )  ) 
(  t 

(if  (or  (eq  (car  prop-value)  qvalue) 
(eq  (car  prop-value) 

( conca  t  qvalue  ' / * )  ) 
( eq  I  car  p  rop-va I ue ) 

(c  oncat  qvalue  '  /  .  )  )  ) 
then  (setq  found  1 ) 

else  (setq  prop-value  (cdr  prop-value)) 
(go  I oop2)) )  ) ) 
(if  ( =  found  1 ) 

then  (setq  frame  (cons  (car  1st)  frame))) 
( setq  I st  ( cdr  1st)  ) 
(if  (null  1st)  (return  frame)  (go  loopl))   )) 


(defun  I i s t-tex t-equate( qva I ue  1st) 
Iprogl frame  prop-value   found) 
I  ood  1 

(setq  f  ound  0 ) 

(setq    prop-value     (get    (car     1st)     'text)     ) 
(prog(q    qatom    old-prop    onel  ist) 
(setq    q    qva I ue ) 
I  oop2 

(setq    qatom     (car    q)) 

(cond  (  (null  prop-value)  (setq  found  0)  ) 
I  (listp  (car  prop-value)  ) 

(setq  onelist  (car  prop-value)) 
(if  (or  (memq  qatom  onelist) 

(memq  (concat  qatom  '/»)  onelist) 
(memq  (concat  qatom  '/•)  onelist)  ) 
then  (setq  found 

(confirm  qvalue  onelist  prop— value  0)) 
else  (setq  prop-value  (cdr  prop— value)) 
(go  I oop2 I )  ) 
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(  t 

(if  (or  (eq  (car  prop-value)  qatom) 
(eg  (car  prop-va I ue ) 

( conca t  qatom  '  /  » )  ) 
(eq  (car  prop-value) 

(concat  qatom  '/•))  ) 
then  (if  (=  t  ouno  0) 

then  ( se t  q  f  ound  1) 

(setq  old-prop  prop-value)  ) 
( setq  q  ( cor  q)  ) 

(setq  prop-value  (cdr  prop-value)) 
(if  (null  q)  (setq  found  2) 
( go  I oop2  )  ) 
else   (if  (=  f  ound  1  ) 

then  (setq  prop-value  (cdr  old— prop)) 
(  se  tq  q  q va I u  e  ) 
(setq  f  ound  0 ) 
e  I  se 

(setq  prop-value  (cdr  prop-value))  ) 
( go  I  oop2)  )  )  ) ) 
(if  (»  found  2 )  then 

(setq  frame  (cons  (car  1st)  frame))) 
(setq  I st  (cdr  1st)  ) 
(if  (null  1st)  (return  frame)  (go  loopl))   )) 


(defun  con f i r m( q va I ue  onelist  p-value  f) 
(proglword  qatom  oldp-value  sameq) 
(setq  sameq  qvalue) 
I  oop 

(setq  qatom  (car  sameq)  ) 
(cond  (  (null  p-value)  (return  0)  ) 
(t 

(setq  word  (car  onelist)) 
(if  (or  (eq  word  qatom) 

(eq  word  (concat  qatom   '/i)) 
(eq  word  (concat  qatom   '/.))  ) 
then  (if  (=  f  0) 
then 

(setq  olap-value  p-value)  (setq  f  1)  ) 
(setq  sameq  (cdr  sameq)) 
(setq  onelist  (cdr  onelist)) 
(if  (null  onelist) 
then  (setq  p-value  (cdr  p-value)) 

(setq  onelist  (car  p-value))  ) 
(if  (null  sameq)  (return  2)  (go  loop)1  ) 
else  (if  ( =  f  1 ) 

then  (setq  p-value  (cdr  oldp-value)  ) 
(setq  onelist  (car  p-value)) 
(if  (null  onelist) 
then  (setq  p-value  (cdr  p-value)) 

(setq  onelist  (car  p-value))) 
(setq  sameq  qva I ue ) 
( setq  f  0 ) 
else  (setq   onelist  (cdr  onelist)) 
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(if     (null     onelist) 
t  nen    (  se  tq    p-va  I  ue 
( se  tq    one  list 
( go     I oop  )     )     111) 


(cdr 
I  car 


■va  I  ue  )  I 
•va  I  ue  )  )  )  ) 


'  Gr  '    queries    are    dandled    by    'disjoin'    and 
'  merge  1".         The    output    streams    for    the    various 
disjuncts    of     the     'or'     are    computed    separately 
and    then    mer  ged  ■ 


(defun    d i s j o in ( q- I i st    frame) 
( cond     ( ( nul  I     q-l i  st)        ' ( ) ) 

(t     (mergel     (qeval     (car    q-list)     frame) 
(disjoin     (cdr     q-list)     frame))     ))) 


(defun    mergelllstl     Ist2) 

(cond    (     (or     (eq    Istl    'error)     (  eq     Ist2    'error))     'error) 
( (nul  I     I st2  )     Istl     ) 
((memq    (car     I s t2 )     Istl)     (mergel     Istl     (cdr     Ist2)     )) 
(t    (cons    (car     Ist2)(mergel     Istl    (cdr     Ist2)))))l 


'And'    queries    are  handled    by    this    function. 

'Conjoin'    takes    as  inputs    the    conjuncts    and    a 

stream    of    entries)  and    returns    the    stream    of     fi 
tered    entries. 


(defun    conjoin! q-list    entries) 

(cond    (  ( eq    entries    'error)     'error) 
((null    q-list)    ent  r  i  es  ) 
(t     (conjoin     (cdr    q-list) 

(qeval     (car     q-list)     entries))))) 


'Not'  queries  are  handled  by  ' 
Given  a  query  and  a  stream  of 
a  stream  of  entries  which  don1 
values  satisfying  the  queries, 


negate'  function, 
entries*  it  returns 
t  contain  property- 


(defun  negatetq  frame) 

(diff   frame    (qeval  (car  q) 


t  r  ame  )  )  ) 
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loetun    di ( f ( I stl     I st2) 

(cond  (  (eq  I s  1 2  'error) 
(  Inn  I  I  I st2 )  I stl 
( t     <  d  i  f  t     (del     (car 


'error     ) 

) 

Ist2)     I  stl)        (cdr     Ist2) ) )     )  ) 


(aefun    de I  ( e I  em     1st) 

(cond    (     (nu  II     1st)     •  (  )     ) 

(     (eq    elem    (car     1st))        (del     e  I  em     (cdr     1st))     ) 
(t     (cons    (car     1st)     (eel     elem     (cdr     1st)))))) 


'Less'     handles    the    query    with     '<'     as     its    predicate" 

operator.       Given    a    queryi    property-name    and 

property-value    specified    in    the    queryi    and    a 

stream    of     entries,     'less'     returns    a    filtered 

stream    of     entries    which    contain    p  r  ope  r  t  y  va  I  ues 

satisfying    the    query-condition. 

'Mw-less'     is    called     for     the    query    with    sm,     rim 

rwi    and    sw      as    its    property-name. 

'Num-less'     is    for    the    query    with    rd,     sd,    and    sy 

as    its    property-name.  'St-less'     is    for    the    query" 

w i  th    r  y  .  " 


(defun  lesslq  pty  value  frame) 
(if  ( eq  q  'error)  'error 
(cond  (  (memq  pty  ' I rm  sm  r »    sw)) 

(mw-less  pty  frame  (get-list2  pty  value))  ) 
(  (  memq  pty  '  (  r  a     sd  sy  )  ) 

(num-less   pty  value   frame)) 
(  ( eq  pty  ' r  y  ) 
(st-less  pty 

(get_pname  (concat  value  ""))  frame)) 
( t   'error  )  )  )  ) 


(defun  mw-less  (  field  frame  1st) 
(cond  (  ( nul I  1st)  ' ( ) ) 

(t  (mergel  (equatel   field  (car 
( mw- less  field  fr  ame 


1st) 
(  cdr 


frame) 
1st)))) 


(defun    num-less(     field    qvalue     I) 
(proq     (frame    prop— value) 
I  oop 

(setq    prop-value     (get-property-value     field     (car     I))) 
(if       (<    prop-value     qvalue) 

(setq    frame    (cons     (car     I)     frame))     ) 
(setq     I     (cor     I)) 
lit     (null      I)     (return     frame)     (go     loop)     )        )     ) 
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(defun  s t-  I  ess ( (  i e I  a    qvalue  I) 
(prog  (frame  prop-value) 

I  OOP 

(setq  prop— value  (get-property-value  field  (car  I))) 
(if  (alphalessp  prop-value  avalue) 

(setq  frame  (cons  (car  II  frame))  ) 
( setq  I  (cdr  I )  ) 
(if  (null  I)  (return  frame)  (go  loop)  )  )) 


'Less-eq'  handles  a  query  witn  '<  =  '  as  its  predicate" 
operator. 


(defun  less-eq  (q  entries) 

(mergel  (less   q   (car  q)  (caor  q)  entries) 

(equate  (nth  0  q)  (nth  1  q)  entries))) 


'Greater'  is  a  function  to  handle  the  query  with 
'>'    as  its  predicate  operator.     For  the  query  with 
'rm»  sm>  rw<  and  sw'  as  its  predicate-name*  'mw-gr' 
is  called.   The  query  having  'rdi  sd ,  and  sy'  is 
handled  by  a  function<  'num-gr'.   'St— gr'  takes  care 
of  the  query  with  'ry'. 


(defun  greaterlq  field  value  frame) 
(if  ( eq  q  "error)  'error 
(cond  (  (memq  field  Mrm  sm  rw  sw)) 

(mw-gr  field  frame  (get-lisfi  field  value))) 
(  (memq  field  '(rd  sd  sy)) 

(num-gr  field  value  frame)  ) 
(  (eq  field  • ry  ) 
( st-gr  field 

(get_pname  (concat  value  ""))  frame)  ) 
( t  'error  )  ) ) ) 


(defun  mw-gr  (field  frame  1st) 
(cond  (  (null  1st)  • ( ) ) 

(t  (mergel  (equatel   field  (car  1st)  frame) 

(mw-gr  field  frame  (cdr  1st))))  )) 


(defun  num-gr (  field  qvalue  I) 
(prog  (frame  prop-value) 


13:5<i     1987       UIRP.I     Page    16 


I  OOP 

(setq  prop-value  ( yet-pr ope r ty 
(it   (>   prop-value  qvalue) 

(setq  frame  (cons  (car  I) 
( setq  I  (cdr  I ) ) 
(if  (null  I)  (return  frame)  (go  loop)  ) 


value  field  (car  I  ) )  ) 
frame) )  ) 

)  ) 


Ids fun    st-gr(field    qvalue     I) 
(prog     (frame    prop— value) 
I  oop 

(setq    prop-value    (get-property-value    field    (car 
(if     (and    (not    (alphalessp    prop-value     qvalue)) 
(not    (equal     prop-value    qvalue))) 
(setq    frame    (cons     (car     I)     frame))     ) 
(setq     I     (cdr     I  )  ) 
(if     (null     I)     (return    frame)     (go     loop)     )     )) 


I  ))  ) 


The    query    having    •>«'    as    its    predicate-operator 
is    handled    by    ' g r eate r-eq ' • 


(defun    greater-eq    (q    frame) 

(mergel    (greater      q    (car    q)     (cadr    q)       frame) 
(equate     (nth    0    q)     (nth    1    q)       frame))) 


'Get-list^'     is   a    function    called    by    a    function) 
Mess'.       Given    a    property-name    and    a    month    or 
week,     'get-list2'    calls     'get-listl'    to    obtain 
the    list    of    months /week-name s    which    are    less    than 
given    month    or    week    and    returns    the    list. 


(defun    get-listl    (element    1st) 

(if       (eq     (car     1st)     element)        '() 

(cons  (car  1st)  (get-listl  element 


(  cdr  1st)  )  )  )  ) 


(defun  get-list2  (field  value) 
(cond  (  (memq  field  '  (  r  w  sw)l 

(get-listl  value  (weeklist))) 
(  ( memq  field  'leu  sm) ) 

(get-listl  value  (monthlist)  )))) 


'bet-  I  i  st  <t '  is 


function  called  by  'greater'. 
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Given  a  property-name  and  a  month  or  week  name? 
this  function  calls  'get-list3'  to  obtain  a  list 
of  months/week  name s  which  are  greater  than  given 
month/week  name*  and  returns  the  list. 


Idcfun  get-list3  (element  1st) 

(if   ( eq  (car  1st)  element)  (cdr  1st) 
(get-list3  element  (cdr  1st)))) 


(defun  get-list4  (field  value) 
(cond  (  (memq  field  '(rw  sw)) 

(get-list3  value  (weeklist))) 
(  (memq  field  ' ( rm  sm ) ) 

(get-list3  value  ( mon th  I  i s t )  )  )  ) ) 


'Equatel'  is  a  subfunction  called  by  'mw-gr'  or 
'mw-less'  to  obtain  a  list  of  entries  which 
contains  the  given  property-values  (ovalue)  under 
the  given  properties  (field)  such  as  rw»  rm»  sw> 
and  sm. 


(defun  equateK  field  qvalue  I) 
(prog  (frame  prop-value) 
( cond  (  (eq  field  'sw) 

(setq  qvalue  (concat  qvalue  ")")))) 


I  oop 
(  setq 
( cond 


prop-value  (get-property-value  field  (car  I))) 

((eq   prop-value  qvalue) 
(setq  frame  (cons  (car  I)  frame))  )) 
( setq  I  ( cdr  I  ) ) 
(if  (null  I)  (return  frame)  (go  loop)   ))) 


(defun    q-f  i  e I d-l i  st( ) 

Mrdate    sdate    subject    status    to    from   body    )     ) 


(defun  month  list!) 

1 ( jan  feb  mar  apr  may  jun  Jul  aug  sep  oct  nov  dec)  ) 


(defun  weeklist!) 

'(sun  mon  tue  wed  thu  fri  sat)  ) 


(trace  (ureal  I  I 
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f  or     auto     I oad  i  ng 
(setq    user -top- I  eve  I     'start) 


Dec   8  14:03  19S7 


Page  1 


<f  i  nc  I  urie  <ctype.  h> 
*  i  nc  I  ude  <s  t  di  ch> 
# include  <pwd .h> 
«  i  nc I u  de  <utmp.n> 

char     *my_name ; 
cha r     *ge  t  I  oqi  n(  )  ; 
struct   passwd   *getpwuid(); 
i  nt  i 1  j  ; 


m  I  (a  ) 

cha  r  at  ]  ; 
{ 


my_name    =    getloginl); 

if     <my_name    •«    NULL     I!     s tr  I  en (my_name ) 
struct    passwd    *pwent» 
puent    ■    ge tpwu i d ( ge tu i d( I ) ; 
my_natrie    =    pwent->pw_na me j 


0)  { 


j  -  o; 

for  (i=0;  my_name[ i J i  i++) 

a[j++]  ■  my_name[ i J ; 
r  etur n ( j  ) ; 
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ABSTRACT 

An  eletronic  mailbox  system  can  organize  messages  into  a 
database  according  to  their  contents  and  provide  users  with  a 
facility  lor  categorizing  desired  message(s)  for  retrieval. 
The  Query  system  for  Information  Retrieval  in  a  Mailbox 
( QIRM )  has  been  designed  based  upon  such  concepts. 

QIRM  increases  the  selectivity  of  information  retreived 
by  allowing  users  to  specify  their  requests  using  a  mail- 
query  language  resembling  SQL.  Also,  QIRM  provides  some 
insights  for  developing  an  intelligent  Unix  mail  facility  by 
employing  several  techniques  of  Artificial  Intelligence  such 
as  semantic  networks,  property  lists,  and  matching.  The 
database  in  QIRM  is  based  on  the  semantic  network  structure 
and  also  utilizes  an  indexing  scheme  for  the  purpose  of  fast 
searching.  The   mail-query   language  is  based   on   the 

tuple  calculus  and  provides  a  user-friendly  interface  and  a 
pattern-directed  access  of  the  messages.  The  query 
processor  of  QIRM  employs  the  network -fragment -matching 
mechanism  and  utilizes  the  stream  of  property  lists. 


